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ABSTRACT 

Consider  a  statistical  decision  problem  which  involves 
an  observable  random  variable  X  whose  distribution,  which  depends 
on  an  unknown  parameter  A  ,  is  known.  Suppose  that  this  problem 
occurs  repeatedly.  The  sequence  is  regarded  as  the 

realization  of  the  independent  random  variables  A^,A  . ..  with 
a  common  unknown  a  priori  probability  distribution.  The  "empirical 
Bayes  problem"  refers  to  the  approximation  of  the  Bayes  decision 
function  which  would  be  optimal  if  the  a  priori  distribution  were 
known  in  advance . 

Following  an  introduction  and  history  of  the  empirical 
Bayes  problem  in  Chapter  I,  five  main  areas  are  discussed.  In 
Chapter  II  asymptotically  optimal  empirical  Bayes  estimators  and 
tests  of  hypotheses  concerning  the  true  value  A  of  the  random 
variable  A  are  given.  Two  methods  of  estimating  the  a  priori 
distribution  are  given  in  Chapter  III. 

In  Chapter  IV  the  empirical  Bayes  approach  is  used  in 
selecting  the  "best"  of  k  populations  or  a  subset  containing  the 
"best" .  Empirical  Bayes  procedures  are  obtained  in  the  case  where 
the  a  priori  distribution  on  the  parameter  space  is  assumed  to 
belong  to  a  particular  parametric  family  and  the  case  where  the  a 
priori  distribution  is  assumed  to  possess  a  finite  absolute  mean. 
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Chapter  V  deals  with  the  non-parametr ic  case  in  which 
the  class  of  conditional  probability  distributions  of  X  given 
A  is  not  restricted  to  a  particular  parametric  family.  Analogous 
results  to  Chapter  II  are  obtained,  and  an  application  is  made  to 
the  problem  of  selecting  the  i'best"  of  k  populations. 

The  last  chapter  is  concerned  with  the  closely  related 
compound  decision  problem.  The  unknown  parameter  values 
are  regarded  as  a  sequence  of  arbitrary  unknown  constants,  and 
information  is  obtained  from  the  observed  sequence  x^,x^,... 
regarding  the  frequency  distribution  of  the  parameters.  The  aim 
is  to  approximate  the  decision  function  which  would  be  optimal  if 
the  frequency  distribution  of  the  parameters  were  known  in  advance. 
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CHAPTER  I 


HISTORY  OF  THE  PROBLEM 

Consider  a  parameter  space  ft  such  that  to  each  parameter 

point  X  e  ft  there  corresponds  a  conditional  probability  distribution 

over  some  sample  space  E  with  corresponding  conditional  distribution 

function  .  An  observable  random  variable  X  with  values  in  E 

follows  one  of  the  distributions  P,  where  X  is  unknown.  An  appropriate 

A 

experiment  yields  the  observation  X  =  x  from  which  we  wish  to  infer 
the  corresponding  value  T)  of  the  parameter  X  before  taking  some 
contemplated  action.  If  jD  is  the  set  of  all  possible  actions,  how 
can  we  best  define  a  function  S(x)  on  E  with  values  in  which 

optimizes,  in  some  sense,  the  selection  of  the  action  to  be  taken  in 
accordance  with  the  results  of  observations? 

The  following  examples  are  applications  of  the  above  problem. 

Example  1.  Acceptance  sampling.  Consider  lots  containing  N  items. 

In  order  to  decide  whether  a  lot  should  be  accepted  or  not,  it  is  customary 
to  select  n  items  randomly  from  it  for  inspection.  X  is  the  number 
of  defectives  in  the  sample,  and  X  is  the  number  of  defectives  in  the 
lot.  The  parameter  space  ft  consists  of  the  N  +  1  integers  0,1,2,...,N  . 
The  set  J^)  of  actions  contemplated  may  include  only  two  elements,  "accept 
lot"  and  "reject  lot". 
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Example  2 .  Diagnostic  tests.  It  is  common  practice  to  assume  that 
the  seriousness  of  a  disease  can  be  expressed  in  terms  of  a  parameter 
A  .  The  parameter  A  cannot  be  measured  directly  for  an  individual, 
but  a  series  of  diagnostic  tests  are  given  which  provide  data  for  the 
estimation  of  A  .  Thus  the  value  of  a  random  variable  X  ,  whose 
distribution  depends  on  A  ,  is  observed .  Here  again,  A  has  a  certain 
range  ft  of  possible  values,  and,  at  least  in  some  cases,  the  distribution 
of  X  given  A  may  be  assumed  to  be  known. 


The  "classical"  solution  to  the  above  and  similar  problems 
is  given  by  Bayes' formula,  which  requires  that  A  is  the  realization 
of  a  random  variable  A  with  a  priori  probability  distribution 
G( A)  =  P[A  <  A]  .  Considering  the  overall  sample  space  ft  X  E  ,  the 
random  variables  (A,X)  are  assumed  to  have  a  joint  distribution.  For 
given  G  and  any  observed  X  =  x  ,  Bayes'  formula  gives  the  conditional 
distribution  of  A  given  X  =  x  ,  say 


(1.1) 


d  B(AJx) 


fA(x)  d  G(A) 


f^(x)  d-G(A) 


where  f^(x)  t*ie  conditional  probability  density  of  X  given  A  =  A  . 

A  may  be  estimated  by  computing  the  mean  of  the  distribution  (l.l). 

Except  in  rare  cases,  Bayes'  formula  cannot  be  applied  unless 
A  is  the  realization  of  a  random  variable  A  with  an  a  priori  distribution. 
The  data  in  Example  2  do  not  contain  anything  regarding  this  distribution. 
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and  in  Example  1,  A  may  not  be  regarded  as  the  realization  of  a 
random  variable. 

The  problem  thus  arose  of  devising  a  mathematical  theory  of 
using  the  observable  X  in  order  to  provide  a  justifiable  selection  of 
decisions  or  actions  regarding  the  true  value  A  of  A  f  even  in  those 
cases  where  the  datum  of  a  problem  does  not  include  the  a  priori 
distribution  G(X)  . 

Early  results  by  Bernstein  and  Von  Mises  (around  1920)  stated 
generally  that  if  we  consider  A  to  be  the  realization  of  a  random 
variable  A  with  a  continuous  probability  density,  then  as  the  number 
of  conditionally  independent  observations  is  increased,  the  conditional 
distribution  of  A  given  X  =  x  ,  given  by  (l.l),  tends  to  a  calculable 
limit,  independent  of  the  a  priori  distribution  G(A)  .  The  requirement 
of  a  large  number  of  observations  was  not  practical  in  most  cases,  and 
in  others  there  was  a  reluctance  to  consider  A  as  a  realization  of  a 
random  variable. 

Other  attempts  to  solve  the  problem,  such  as  the  "principle 
of  insufficient  reason",  which  states  generally  that  if  one  is  ignorant 
of  the  a  priori  distribution,  one  "has  the  right"  to  assume  that  it  is 
uniform  over  Cl  ,  were  of  little  help.  In  the  1930's>  the  functions 
6(x)  ,  now  regarded  as  random  variables,  were  the  principal  subject  of 
study  and  were  called  statistical  decision  functions.  Their  properties 
were  studied,  and  references  were  made  to  "losses"  incurred  due  to  "errors", 
the  primary  concern  being  the  case  where  A  was  not  the  realization  of 


O  r<  >rt3  [.oil.  ..•jHIbut  •  snlelvsb  3c  =  a.  J  >  •  o-  I  «  r 

:orb  ol  OW9  ,  A  3o  A  *ul»v  9033  »dl  S«ttllS*l  s<ioiJa»  10  sno  i  o,’, 

■ 


39dmuo  9ri3  as  n  dj  ,X3i«n#b  ylllldbdciq  auooo  3000  »  ***  '  s!  'f.r  ,v 

g  ,  ,  o  1691  •  »•  *  39b  i-OO  03  >0"  *•  :■  "**  13  ■  10 


,1  3J  38rf3  9«m:«B  03  "irisiT  903  99,  900  ,  X3udl33»lb  Holiq  »  »d3  10 

800130001  9d3  ,a'05?I  9ri3  ol  .qlsri  9J331I  lo  919W  ,  0  ,9vo  «30  inu 

lo  3,*(  du«  loo:  '  901  >3  0  9  ,99ld8l3SV  I  I  »  l-ablB  .S,  WOO 

*.  l339qo?q  iloifT  . anollodj1*  ooielosb  Ib»1191j8J<>  bo/lBo  n«»  bos  yboi  a 
10139"  03  90b  bo:  30001  "89880J"  03  9b6®  938«  89009391,3  bob  tb9lbo39  939V; 
lo  nol3»slIa9i  9ri3  3on  aBw  A 


.  4  - 


a  random  variable. 

After  World  War  II  there  were  protests  that  the  use  of  previous 

experience,  which  would  indicate  that  some  values  of  the  parameter  A 

are  more  probable  than  others,  was  being  ignored  by  tests  of  hypotheses 

and  confidence  intervals.  In  1955 ^  Robbins  [12]  utilized  previous  experience 

in  the  following  manner.  It  was  assumed  that  A  is  the  realization  of 

a  random  variable  A  with  unknown  distribution  G  and  that  the  problem 

of  estimating  the  true  value  A  of  A  from  an  observed  value  of 

X  ,  a  discrete  random  variable,  occurred  repeatedly.  Thus  there  was  a 

sequence  (A^,X_^)  >  i  =  l,2,...,n,  where  each  A^  was  assumed  to  have 

the  same  unknown  marginal  distribution  G  ,  and  given  A^  =  A^  ,  the 

corresponding  random  variable  was  assumed  to  have  the  known  conditional 

probability  density  f^  (x)  . 

i 

Assuming  that  the  values  A^,A^,...,A^  are  all  unobservable 
realizations  of  the  random  variables  respectively,  and 

that  the  observations  are  limited  to  the  values  of  X.,X0,..„,X  , 

Robbins  used  all  these  observations  and  the  postulated  existence  of  G 
in  order  to  estimate  A^  .  This  was  called  the  empirical  Bayes  problem 
and  corresponds  exactly  with  the  situation  in  Example  2. 

The  empirical  Bayes  procedure  shows  essentially  that  if  the 
amount  of  previous  experience  is  large,  then  the  Bayes  estimate  of  A 
that  could  be  computed  with  full  knowledge  of  the  a  priori  distribution 
of  this  parameter  cannot  be  much  better  than  Robbins'  empirical  Bayes 
estimate  of  A  .  Robbins  indicated  the  need  for  selecting  the  estimate 


„!V'  lo  1.9l  d  b.iof  S'  »'  »  >*  *  0  DL  '  ld6j°lq  01‘ " 


9  a9l„qX9  evoiv.iq  bMXlttu  [Si]  .» Id**  -i  al  9an#blinos  bn» 

to  BuiBv  bovisedo  n.  aroii  A  lo  A  9UI..V  9UTd  9rid  snijao:  .9  1- 


9  99W  ..Hi  »U(IT  .,ib.l.SQ«  bWMO  .»«9i999  «0ba»7  •»•>»•*<>  .  ,  X 


,  (  x,  ^)  SDn  iixp  >a 


I.noillbnoo  nwooX  9.1,  svs.1  ol  b»u.«.  e.w  }X  9ld.i-.Bv  inobnei  salbaoqB.noo 

,  x,...,,-*,.*  5o  B.UI.V  .rid  oi  b.ltaii  «»  .nollBvisBdc  •*»■»•* 


A  si  a  raises*  oJ  J9bio  «. 


.S  sior  BXi  ni  noi,BijJlB  .Hi  Hlxw  xllo«.  Bbnoq«9«09  bo. 


nol ludlilBlb  Ixolxq  .  .Hi  lo  Bgb.iwonX  Iiui  Hliw  b.luqnoa  .d  biuoa  ,  •  ' 


nwill.-  r  Hi  gall 3.1 ..  ioi  b»M  #H1.  b.l.aibni  iniddol  .  i  c 


that  was,  in  some  sense,  best. 


-  5  - 


Shortly  after  the  initial  breakthrough  by  Robbins,  Johns  [6], 
in  1957,  studied  the  non-parametr ic  empirical  Bayes  problem  where  the 
class  of  conditional  probability  distributions  of  the  random  variable 
X  is  not  restricted  to  a  particular  parametric  family.  After  a  lull 
of  several  years,  Robbins  [13],  [14]  and  Samuel  [20]  renewed  work  on  the 
empirical  Bayes  problem  in  the  early  1960's.  Both  discussed  the  empirical 
Bayes  approach  to  statistical  decision  problems,  and  Robbins  introduced 
methods  of  estimating  the  a  priori  distribution. 

In  recent  years  there  has  been  a  surge  of  activity  in  this  area. 
Rutherford  and  Krutchkoff  [16],  [17]  and  [18]  have  extended  results  on 
the  parametric  empirical  Bayes  approach  to  statistical  hypotheses  testing, 
point  estimation,  and  the  estimation  of  the  prior  distribut ion* Krutchkoff  [8] 
has  discussed  the  non-parametr ic  empirical  Bayes  approach  to  decision 
theory.  Deely  [2],  [3]  has  applied  the  method  to  the  problem  of  selecting 
the  best  of  k  populations  in  both  the  parametric  and  non-parametr ic 
cases  . 

Often,  as  in  Example  1,  there  may  be  a  reluctance  to  consider 
that  the  unobservable  parameter  A  is  the  realization  of  a  random  variable 
A  with  a  fixed  but  unknown  a  priori  distribution.  Suppose  we  have  N 
disconnected  problems  which  are  of  the  testing  hypotheses  or  estimation 
type.  In  acceptance  sampling,  for  example,  there  may  be  N  =  1000  lots 
of  shoes,  each  having  A  defectives.  The  problem  is  to  decide  whether 
the  i'th  lot  should  or  should  not  be  accepted  as  conforming  with  the 
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agreed  specifications  for  i  =  1,2,...,N  . 

Initially  it  was  thought  that  the  N  identical  decision  problems 
should  be  treated  separately  in  the  best  possible  way,  perhaps  by  using 
most  powerful  tests.  In  1950,  Robbins  [11]  stated  that  if  a  large  number 
N  of  identical  but  unrelated  decision  problems  are  treated  simultaneously 
(the  compound  statistical  decision  problem),  then  in  certain  circumstances 
the  overall  expected  frequency  of  errors  of  both  kinds  will  be  below  the 
level  attainable  if  N  independent  applications  of  the  most  powerful 
test  were  made.  The  requirement  of  simultaneity  of  N  decisions  was 
later  dropped  and  an  appropriate  sequential  procedure  was  substituted 
for  the  original  solution. 

In  1955  Hannan  and  Robbins  [4]  studied  the  nonsequential  case 
where  the  component  problems  involve  decisions  between  any  two  completely 
specified  distributions.  More  recently,  Samuel  [19]  and  [21],  Hannan 
and  Van  Ryzin  [5],  and  Johns  [7]  have  extended  the  early  results  of 
Hannan  and  Robbins  for  both  the  sequential  and  nonsequential  cases. 

Van  Ryzin  [24]  also  considers  the  case  where  the  component  problem 
involves  a  finite  parameter  space  and  a  finite  action  space.  At  the 
present  time,  work  is  also  progressing  on  the  standard  (infinite  state) 
compound  estimation  problem  with  notable  results  having  already  been 
achieved  by  Samue  1  [22]  and  Swain  [23]. 

The  compound  decision  problem  appears  to  be  related  to  the 
empirical  Bayes  problem,  which  uses  the  accumulated  experience,  in  the 
following  way.  Suppose  that  before  the  N  decisions  have  to  be  made, 
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the  statistician  knows  the  values  of  the  parameters  A^  ,  i  =  , 

but  not  their  order:  i.e.  the  function  G„(A)  =  4  (number  of  indices 
i,  i  =  1,2,...,N  for  which  A^  <  A  )  is  known.  Then  at  each  decision 
the  statistician  can  use  the  rule  which  is  Bayes  with  respect  to  . 


:  1  :  :  • 


' 


.  ■  -  ■'  I 
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CHAPTER  II 


PARAMETRIC  EMPIRICAL  BAYES  PROBLEM  FOR  TESTING 
HYPOTHESES  AND  POINT  ESTIMATION 


2.1.  Introduction . 

Consider  a  statistical  decision  problem  which  involves  an 
unknown  real  parameter  A  belonging  to  some  set  and  an  observable 

random  variable  X  distributed  according  to  the  distribution  function 
F^(x)  =  F(x|A)  =  P(X  <  x|A)  which  is  known  for  each  A  e  £2  .  If  we 
assume  that  A  is  the  realization  of  an  unobservable  random  variable 
A  with  a  known  distribution  function  G(A)  (the  a  priori  distribution), 
then  on  the  basis  of  observation  x  e  DC  of  X  we  wish  to  estimate  or 
make  a  test  concerning  the  true  value  A  of  A  . 

To  do  this,  let  j£)  =  (d)  be  the  set  of  possible  decisions. 

For  example,  j£)  may  consist  of  two  decisions  in  the  case  of  testing 
a  hypothesis,  or  decision  d  e  may  be  a  real  number  in  the  case  of 
a  point  estimation  problem.  The  best  decision  depends  on  unknown  A  . 

Assume  that  for  d  €  jO  there  exists  a  loss  function 
L(5(x),A)  >  0  ,  the  consequence  of  taking  decision  S(x)  when  the 
distribution  of  X  is  F^(x)  and  where  6  is  a  function  which  assigns 
a  decision  5(x)  e  J^)  to  each  possible  value  x  of  the  random  variable 
X  .  For  any  5  ,  the  expected  loss  when  A  is  the  parameter  is 


■ 

(nomuxMib  Jioi*,  .  «b)  (A)o  Bworut  »  HaJw  A 

.a rol:  >  :  oi  aeoq  io  »  *(U  *d 

,o  ,  ■  ,  •  «.  U  1  .  **  Km  <\'*  k  ••  *•  ,•*••*«««  * 

.  *  nto  no  8b»9q.b  *•«>  «  •»  •  ^  ol,£P,“*9  30  *  B 


sij  fl9  *  (x)c;  noleioab  io  sanaup.^ar  od  arf* 
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R(6 


(2.1.1) 


,A)  =  r  L(6(x),A)  d  F^(x) 

Jx 

=  J'  L(5(X)>A)  fA(x)  d|i(x) 


since  we  can.  assume  without  loss  of  generality  that  FA(x)  is  given  in 

terms  of  its  probability  density  f  (x)  with  respect  to  some  measure 

A 

\i  on  the  sample  space.  The  overall  expected  loss  (global  risk)  when 
the  a  priori  distribution  of  A  is  G  is 


(2.1.2) 


where 


R(5,G)  =  /  R(5,A)  d  G(A) 


f  L  L(5(x),A)  f^(x)  d[i(x)  d  G(A) 


=  y'  q>G(S(x),x)  dp.(x) 


(2.1.3) 


<Pg(d>x)  =  y  1* (d, A)  fA(x)  d  G(A)  >  0  . 


R(5,G)  is  called  the  Bayes  risk  relative  to  G  .  If  there  exists  a 
decision  function  such  that  for  a.e.(|j.)  x 

(j 


cpG(6G(x),x)  =  min  cpn(d,x)  , 


(2.1.4) 


i 


"  ■■  •J - . Sjf  § 


V  \ .  <*•*•»> 


.  o  <  (*)i>  fa  U)/1  A  fa)*’  -  (*%  ^ 
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then  since  cp^  >  0  ,  we  have  for  any  decision  function  5  , 


(2.1.5)  r(5g>g)  =  /  min  <PG(d,x)  d|_i(x)  <  R(S,G) 


If  we  define 

(2.1.6)  R(G)  =  R(8g,G)  =  J'  <Pg(6g(k),x)  dn(x) 

=  min  R(5,G)  , 

6 

then  any  decision  function  5  satisfying  (2.1.4)  minimizes  R(5,G) 

VJ 

and  is  called  a  Bayes  decision  function  corresponding  to  G  ,  and  R(G) 
defined  by  (2.1.6)  is  called  the  Bayes  envelope  function.  Thus  assuming 
G  exists,  we  can  use  R(5,G)  to  judge  how  good  any  decision  function 
6  is.  Since  G  is  unknown,  5  is  not  directly  available  to  us. 

In  the  parametric  empirical  Bayes  approach,  assuming  G  is 
unknown,  we  suppose  that  the  decision  problem  just  described  occurs 
repeatedly  with  independent  random  vector  (A,x)  .  Thus  we  have  the 
sequence 

(2.1.7)  (A1,X1),  (A2,X£),  (An,Xn) 

of  independent  pairs  of  random  variables  where  the  A^’s  ,  i  =  l,2,...,n  , 

are  identically  distributed  according  to  G  ,  and  where  for  n  =  1,2,..., 

the  conditional  distribution  of  X  given  that  A  =  A  is  specified 

n  n 

by  the  probability  density  f^(x)  .  When  a  decision  about  =  ^ 


- 

. 
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has  to  be  made,  we  will  have  observed  x, ,x  ,  =  x  although 

1  2  n  n+1 

the  values  remain  always  unknown.  Thus  for  the  decision 

about  A  ,  we  can  use  a  function  of  x  ,  whose  form  depends  on 
n+1  n+1 

x  =  x_,x_,...,x  with  X  the  corresponding  random  variable: 

~n  1  2  n  — n  r  ’ 

i.e.  a  function 


(2.1.8) 


5  (•)  =6  (x  ;•) 

n  1  n  — n  ' 


Then  we  take  action  5  (x)  €  jS)  with  loss  L(6  (x),A)  .  Although  the 

r 

values  x^>X2>.*.Jxn  are  independent  of  A  ,  these  observations  do 
contain  information  about  G  ,  the  common  unconditional  density  of  the 
Xf,x  ,  .  .  .  ,x  ,x  with  respect  to  p.  being  given  by 


(2.1.9) 


It  is  hoped  that  for  large  n  ,  5^  will  be  close  to  the  optimal  but 

unknown  5  which  we  would  use  throughout  if  G  were  known. 

G 

We  define  a  sequential  decision  procedure  to  be  the  sequence 
D  =  {£>n}  of  the  form  (2.1.8)  for  (2.1. 7)  with  values  in  JO)  .  From 
(2.1.2),  the  expected  loss  on  the  decision  6^  will  be 


(2.1.10) 


dp.  (x) 


and  hence  the  overall  average  loss  will  be 


(2.1.11) 


Rn(n,G) 


E  'Pg(s„(x)>30  +(x)  > 


' 

00  Bboaqsb  «o5  MOih  ,*  lo  nolJooui  »  »*»  *»  X+n‘  3uc  ‘ 


ob  anolJBviosdo  »«arfl  ,-A  Jo  jnaboaqrtol  m  „*<•••  »sx,|1  '  *V 

art,  Jo  x,iaaab  laooilibnoaou  nc«*oo  »Hl  ,  D  Jood.  ool3«»ToJnJ  nialoob 
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where  E  denotes  expectation  with  respect  to  the  n  independent 

random  variables  X  ;X  ,  ,.,,X  .  Since  cp  (5  (x),x)  =  min  cp  (6(x),x)  , 

12  n  G  G  5  G 

therefore  Cp  (5  (x),x)  <E  cp„ ( 6  (x),x)  ,  and  hence 

G  G  —  f  _  '  n 


G'  n 


R(G)  =  [  cpG(5G(x),x)  dpi(x) 

<  Rn(&>G)  =  f  E  ^G^n^'3^ 


We  wish  to  find  a  sequence  of  functions  D  which  for  every  G  contained 
in  some  class  of  a  priori  distributions  is  such  that 


(2.1.12) 


11m  Rn(D,G)  =  R(G)  , 
n  — >  oo 


the  sequence  of  functions  then  being  said  to  be  asymptotically  optimal 

2,2.  General  Results  on  Asymptotic  Optimality . 

De  f  ine 


(2.2.1) 


AQ(d,x)  =  J  [L(d,X)  -  L(d0,X)]  f^(x)  d  G(A) 


where  d^  is  an  arbitrary  fixed  element  of  .  The  following 

theorem  is  due  to  Robbins  [14]. 


Theorem  1.  Let  G  be  such  that 


/  L(?0  d  G(?0  <  oo 


(2.2.2) 


'  ;c  ,  f  I  ) 


■ 


3  t  r 
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where 


(2.2.3) 


0  <  L(A)  =  sup  L(d,A)  < 
d 


00 


Let 


(2.2. k) 


Vd’x)  =  An(x1»X2,...,Xn;d;x) 


be  a  sequence  of  functions  such  that  for  a.e.  (p.)x  , 


(2.2.5) 


sup  '  |An(d,x)  -  AG(d,x) 
d 


0 


(2.2.6)  If  6n(x)  =  6n(X1,X2, ...,Xn;x)  is  any  element  d  e  JD  such 


that  A  (d,x)  <  inf  A  (d,x)  +  e 

■ -  n  —  ,  n  n 

d 


where  ->  0  as  n  -» oo  is_  any  sequence  of  constants ,  then  D  =  [6^ } 


n' 


is  asymptotically  optimal  relative  to  G 


Proof.  To  prove  that  D  is  asymptotically  optimal  as  defined  by  (2.1.12), 
it  is  sufficient,  on  account  of  the  dominated  convergence  theorem,  to 
show  that 


(2.2.7) 


E  cp_(S  (x),x)  <  H(x)  for  all  n  where 
G  n  — 


dji(x)  <  oo 


y 


and 


lim  E  cpG(&n(x).,x)  =  cpQ(5G(x),x)  a.e.  (ji)x  . 
n  — *  oo 


(2.2.8) 


<■  I  I  I  '  *  .  /  '  *  1 
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Letting 


(2.2.9) 


H(x)  =  r  L (A )  f  (x)  d  G(A)  >  0  , 

J  ci 


we  have  from  (2.1.3)  and  (2.2.3)  that  for  any  D  =  {5^}  , 


(2.2.10)  cpG(5n(x),x)  =  r  L(Bn(x),X)  f^(x)  d  G (X ) 

•J  Cl 


Also, 


<  /  L(A)  f,  (x)  d  G(A)  =  H(x)  for  all  n 

Jq,  a 


(2.2.11)  J'  H(x)  dja(x)  =  L(?s)  ^  J'  f^(x)  dia(x)  ^  d  G(?s) 

=  /  L(^)  d  G(A)  <  oo 

Ja 

by  (2.2.2),  so  that  from  (2.2.10)  and  (2.2.11)  we  obtain  (2.2.7).  To 
prove  (2.2.8)  it  will  suffice  to  prove  that 


(2.2.12) 


^(5  (x),x)- 


TGV  G 


(5  (x),x)  a.e.(jj.)x 


since  from  (2.2.10)  and  (2.2.11)  cpn(S  (x),x)  <  H(x)  <  00  a.e.  (m-)x  , 

L  n 

e nab ling  us  to  apply  the  dominated  convergence  theorem  again.  If 


l0w  =  r  L(d0*^)  f^(x) d  g(^)  < 


(2.2.13) 


00 


0<  4s>b  U)£  O'  *  •'  'S-S 


r  3  ’C.  I  c:i  9“ 

' 


,o  • 


(/no  b  '  (X)i,fa  (^J  -  <*)*  (*)-  I  !.  .  . 


s 


»>  (r.  >  )  1  <  :  -  0  *  •*’  "°U  6°n£B 

.nl»S8  «»309rf3  90a*B*9V»09  b9^»«imoi>  91,1  03  *u  8n3ld9n9 
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then  for  a  .e  .  (p.)x  , 

(2.2.14)  CPG(d>x)  =  L0(x)  +  AG(d>x)  • 

From  (2.1.4)  and  (2.2.14)  we  have 

AG^5n^X^X^  =  ^G^n^’3^  “  L(/X) 

>  9g(5g^x^x)  '  L0^X^  =  Ag^5g^X^X^  * 

Thus  for  any  given  e  >  0  and  n  sufficiently  large  we  have  by  (2.2.5) 
and  (2.2.6)  that,  with  probability  as  close  to  one  as  we  please, 


0  -  Ac(6n('X),X)  "  \SbG^’X^ 

=  [AG(&n(x),x)  -  An(5n(x),x)] 

(2.2.15)  +  [Afl(&n(x) ,x)  -  An(5G(x),x)] 

+  [An(6G(x),x)  -  Ag(&g(x),x)] 

<  e  +  €  +  e 

—  n 


Therefore 

(2.2.16)  AG(Sn(x),x)-^>  Ag(6q(x),x)  a.e.  (n)x 

and  by  (2.2.14), 


(PG(5n(x),x) 


9g(&g^x^x^ 


a  .e 


(n): 


which  proves  (2.2.12)  and  hence  the  theorem. 


■  t.  «,»  (.  i,  )  mi  (*i. I.S)  i  a 
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When  the  decision  space  J^)  is  finite,  we  have  the  following 
corollary,  the  proof  of  which  is  the  same  as  for  Theorem  1  with  appropriate 
changes . 


Corollary  1.  Let  =  [d^,  d^,  .  .  .  ,  d^}  be  £  finite  set  and  G  be_  such 
that 


(2.2.17) 


j  L  (d .  d  G(X)  <  oo  ,  i  =  0,1, ...,m 

-la  1 


Let  A.  (x)  =  A.  (X, ,X_, . . . ,X  :x)  for  i  =  1,2, ... ,m  ,  and 
-  l,  nv  '  l,  n  V  29  9  n’  '  -  9  9  99  - 

n  =  1,2,  .  .  be_  such  that  for  a.e  .  (p,)x  , 


(2.2.18)  A.  (x)' 

'  7  i,nv 


/  [L(d1(^)  -  L(d0,?v)]  fA(x)  d  G(X)  =  AG(di(x)  . 

ul 


Setting  Aq  ^(x)  =  0  and  defining 


(2.2.19)  S^(x)  =  ^  wbere  k  any  integer  0  <  k  <  m  such 


n 


that  A^n(x)  =  min  [A^Jx)  =  0,  A^x),  A^x)]  , 


then  D  =  (8^)  i£  asymptotically  optimal  relative  to  G  . 


If  m  =  1  j  we  have  the  two  decision  problem  and  obtain 


Corollary  2.  Let  jQ  =  {d^d^  and  let  G  be  such  that 


[  L(d  ,*)  d  G(%)  <  oo  , 


(2.2.20) 


i  =  0,1  , 


"  \ 
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and  let  A^(x)  =  An(x^,x  ,  ’  *  *  ’  Xn’x)  —  suc^  that  for  a.e  .  (p.)x  , 

(2.2.21)  AnM-^  Vx)  =  f  [L(drN)  -  L (dQ, A) ]  f^(x)  d  G(X) 

^2 


Defining 


if  A  (x)  >  0 

n  — 

if  A  (x)  <  0 

n 

then  D  =  {5^}  is_  asymptotically  optimal  relative  to  G  . 

2.3.  Application  for  Two  Dec ision  Problems . 

A.  Discrete  Case . 

Samuel  [20]  and  Robbins  [ 1 3 1  and  [14]  have  applied  the  above 
to  the  problem  of  hypothesis  testing.  Suppose  we  restrict  ourselves  to 
discrete  distributions  of  the  type 

(2.3.1)  f^(x)  =  g(x)  for  x  =  0,1,2,... 

Some  of  the  more  common  distributions  which  follow  (2.3»l)  are  the  Poisson, 
geometric,  and  negative  binomial  distributions,  p.  will  be  counting 
measure  on  ^  =  {0,1,2,...}  .  Let  us  consider  the  case  where 

s 

(2.3.2)  L(d1,A)  -  L(dQ,A)  =  ^  a.  AJ  , 

J=0 


(2.2.22) 


r  d  , 

8  (*)  -  -[  0 
n  K’ 


that  is,  the  difference  between  the  losses  is  a  polynomial  in  A  .  For 


.  D  oJ  3vl3Bl9i  Jjwt  _  Ylla-.r  *31  «1  (3)  -  a  nsrf3 
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(2.3*2)  ,  (2,2.20)  holds  whenever  G  has  a  finite  moment  of  order  s 
Then  for  (2.3*2)  one  has 


(2.5.3)  Ag(x)  =  f  [L(d1,7v)  -  L(d0,^)]  f%(x)  d  Gft) 


j=o 


and  for  distributions  of  the  type  (2.3*1),  we  have  from  (2.1.9), 


(2.3.4) 


fG(x)  =  f  g(x)  >vX  h  (Tv)  d  G(Tv)  . 

£2 


Thus  in  this  case  (2.3.3)  becomes 


(2.3-5) 


a 

^  XX  h  (Tv )  g(x)  d  G(Ts) 


j='0 


a.  g (x )  /  XX+J  h (%)  d  G(Tv) 

2  ■’0, 


Now  defining 


■  1  ai  £g(x+j)  ifSr 


j= 0 


(2.3.6) 


&(x,y) 


1  ,  if  x  =  y 

0  ,  if  x  4  y 


and 


:  .  .  3}  10'*  •  :T 


. 


(X)0  b  (X)rf  <  {*)»  \  -  (*V  (-•f.-S.i 


(  h  (*)»  (*)rf  3  ^ 


(A'lO  b  (*)ri  L  <  \  (*)8  •*  v  *•  (r  '  ' 
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n 


(2.3-7) 


fnM  =  fn(XrX2,...,Xn;x) 


£  1 6(vx) 

j=l 


and  since  from  (2.3.6) 


(2.3.8) 


E  S(X  x)  =  P(X  =  x)  =  f  (x)  , 

J  J  b 


it  follows  from  the  law  of  large  numbers  that 


(2.3.9) 


fn(x) — *  fg(x)  .  x  =  0,1,2,...  , 


and  hence  from  (2.3.5)  that 


(2.3.10) 


A  (x)  = 

n 


o 

y  aifn^x+^ 

Z_j  J  n  g(x+J 

j=0 


Ag(x) 


Thus  for  the  hypothesis  testing  problem  with  loss  function 
defined  by  (2.3.2)  and  for  distributions  of  type  (2.3*l)>  condition 
(2.2.21)  is  satisfied  so  that  Corollary  2  is  applicable  and  (2.2.22) 
yields  an  "optimal"  empirical  Bayes  rule  provided  (2.2.20)  holds. 

Two  common  loss  functions  are  particular  cases  of  (2.3.2). 

*)£ 

if  A  <  A 

,  * 

if  A  >  A 

„  * 

if  A  <  A 

* 

if  A  >  A 

where  A  is  a  fixed  constant,  seems  to  be  appropriate  for  the  problem 
of  testing  a  one-sided  null  hypothesis 


(2.3.11) 


0 


L(d0,X)  = 


L(d1,x)  = 


# 


A-A 

■* 

A  -A 


0 


. 


>  .  I 


w  l  1  ^ 


. 

o*  b9ll8l3B«  tl 
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(2.3.12)  h  :  X  <  S* 

O  — 

concerning  the  value  of  a  parameter  .  The  two  actions  available  are 
=  accePt  ar>d  d^  =  reject  .  For  testing  two-sided  hypotheses 

of  the  kind 


(2.3.13)  Hq  :  |7v-**|  <  A 

* 

where  A  and  A  >  0  are  fixed  constants, 


(2.3.14) 


L(d0,*)  = 


L(dx,X)  = 


0 


if 

|*-** | 

>  A 

if 

|X-X* | 

<  A 

if 

M*| 

>  A 

if 

<  A 

seems  to  be  appropriate.  The  two  possible  actions  again  are  d^  =  accept 

H  and  d.  =  reject  H 
o  1  o 


As  a  particular  example,  consider  the  problem  of  testing  the 
one-sided  null  hypothesis  given  by  (2.3.12)  concerning  the  value  of  a 
Poisson  parameter  \  ,  Then  Cl  =  (0  <  <  <»}  and  we  adopt  the  loss 

structure  (2,3*11)  with  appropriate  actions  d^  and  d^  .  Then 

yX 

f*(x)  55  t  »  x  =  0,1,2,,,,  , 


and  from  (2,3*1)  we  see  that  g(x)  =  —7 


.  Now 


... 


[  )  -  3 


:  :  J  '  '  1< 


■  . 


11 


*  2) 


■  *i  t  ■  -  ••  o 


■ 


* 
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* 


-  L(dg,A)  =  A  -A  =  2^  aj^'J 

j=0 

from  (2.3.11)  so  that  aQ  =  A  and  a^  =  -1  .  Thus  from  (2.3. 10) 
we  get 


A  (x) 
n '  7 


■  i  ■)  ++ 

j=o 


X  fn(x) 


(X+1 )  fn(X+1)  * 


Thus  if 


(2.3.15) 


5  (x)  = 

n 


d0  ’  if 


d1  ,  if 


(x+1)  f  (x+1) 


n 


* 


TJTf 


<  *  , 


n 


(x+1)  f  ( x+1 ) 

n  ^  -v* 

- TTTJ -  >  x 

n 


then  D  =  {5^}  is  asymptotically  optimal  relative  to  every  G  such 
that 


r°° 

/  Ad  G(A)  <  00  . 


B .  Continuous  Case . 

Corollary  2  may  also  be  applied  to  the  case  where  the  independent 
and  identically  distributed  random  variables  {X_^  :  i  =  l,2,...,n]  have 
a  distribution  function  which  is  absolutely  continuous  with  respect  to 
Lebesgue  measure  Li  on  X  =  (  -00  f  00 )  , 

Let  f^(x)  be  the  conditional  density  of  X  given  A  =  A  . 

If  A  has  the  a  priori  distribution  G  on  ft  ,  the  (marginal)  density 


- 


1*,.  =  f.*A  .  (A,0h) J  -  (*, ,fc)J 

.  (J  0„1  (■•  4/'  l+x  n  t* 


ri3i<3  0  OJ  9vi3f»t=»i  r*  i}qo  ^IXaoiJo  qor^e*  1  { n£]  5  a  '  9  3 

.  ao  >  (x)o  b  x  l  h|;  *  £  ;  *9| 

ino  n  >fc  ...  J  1  3i;  Hv,  ?.*d  ad*  oJ  bailqqa  ed  q.-:'b  V*  i  X*E 1  1 

=  1  :  X}  ealdsitBv  mob  m  b9^w<  H3i  lb  r,  d  ;>nt 

-  t  f  oc ( ’  ■  -  flO  14  9  UEESfll  91^3 

n  .  ,  J“  -■*..•  b  ■  J  a  3  ad  )  &d  <>') 

1  »6  (lenlgmun)  s/to  ,  0  no  0  moJ  JodiT»8*b  iioiiq  9  9ii3  asrf  A  "iJ 


i-  ■  * 

Tj  } 
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function  of  x  will  be 


(2.5.16) 


fGW 


fA(x)  d  G(X) 


In  the  empirical  Bayesian  situation,  independent  random  variables 

{X.  *.  i  =  l,2,...,n]  constitute  a  random  sample  of  size  n  from  the 

distribution  with  density  (2. 3. 16),  and  thus  we  wish  to  find  an  estimate 

f  (x)  =  f  (X  ;x)  such  that  for  all  x  and  as  n  ->  00 
nx  n'— n7  ' 

(2.3.17)  fn(x)  — ■*  fGw 

for  all  possible  f^  ,  f^(x)  then  being  a  consistent  estimate  of  a 

density  function. 


One  of  the  simplest  classes  of  estimates  satisfying  (2. 3. IT) 
with  some  optimality  properties  and  given  in  [15]  is 


(2.3.18) 


f  (x)  = 


F  (x+h  )  -  F  (x-h  ) 
n  n'  nv  n 


n 


2h 


n 


where  h  =  dn  ,  d  >  0  is  some  constant,  and  where  F  (x)  is  the 
n  '  n 


empirical  distribution  function,  viz. 


(2.3.19) 


F  (x)  =  F  (X  :x)  =  —(number  of  indices  i  ,  i  =  1,2, ...,n 
n  nx— n'  n 


for  which  X.  <  x) 
1  — 


For  the  general  case  with  loss  functions  satisfying  (2.3*2) 
and  density  functions  corresponding  to  (2.3*1)  of  the  form 


(2.3*20) 


f\(x)  = 


AX  g (x )  h(A)  ,  for  a  <  x  < 


00 


0 


otherwise 


where  a  is  some  constant  and  may  be  -°°  ,  we  have 


'  .  , .  ‘  '  ■  ■  ■ 

■|  i  G  ■  i1"'  1  1  •  '  1 

1  riao»  (*iJ0, 

(  :<),y  (*),,*  ('  1  -':J 


■  '  .  \-fi 

. 


(  , . ,  1  <  1 


,  -  ■  *1  i«  -  >0  *  I  1  *»•  ■ 

' 


■ 


. 
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(2.3.21) 


Aq(k)  = 


a>J  )  f.(x  d  G(?0  . 

q  \  /->  J  /  A 
j=o 


It  follows  as  before  that  when  (2,3.17)  holds, 


(2.3.22) 


A  (x)  =  A  (X  jx) 
nv  '  n'-rr  ' 


j=0 


a .f  (x+j)  y^X j \ 
j  nv  g(x+j) 


Ag(x) 


Thus  if  (2,2.20)  holds,  an  optimal  empirical  Bayes  rule  for  the  problem 
of  testing  hypotheses  with  loss  function  defined  by  (2.3.2)  is  given  by 
(2.2.22). 


Many  well-known  density  functions  can  be  described  by  (2.3.20) 
after  the  application  of  a  simple  transformation.  As  a  particular  example, 
consider  the  gamma  distribution  given  by 


(2.3.23) 


r  [(262)p/2  rtf)]"1  x^"1  exp  if  X  >  0 

f  (x)  .  J  2  202 

L  0  ,  otherwise 


where  p  >  0  and  0  <  6  <  00  .  If  we  let  =  exp  (-  — ^)  ,  then 
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(2.3.23)  becomes 


(2.3.24) 


Vx)  =  { 


Ax  x ( P/2 ) - 1  (-log  *)P//2 


0 


r(|) 


if  x  >  0 
otherwise 


where  0  <  %  <  1  .  This  has  the  form  of  (2.3.20), 


Since  the  function  \  =  ty(d)  which  takes  the  parameter  6 
(for  the  range  of  6  for  which  f@(x)  is  a  density  function)  into  \ 
is  a  strictly  monotone  function  of  6  ,  there  is  a  one-one  correspondence 


,  a  oi  ,*)  n  n  ,'  :x»r  f  aigi»'  ™  ■ww.uu* 


yd  „  ,i8  ,1  (S.f.3)  yd  boniisb  noidanui  '  jo.T  riaiw  eaeariioqv  »t*J  }o 


(OS.£.S)  k9d  ad  aao  i  ;xoi J -jpxjI  vd  -  u  >  .’ -  *1  ’  •  V  {flftM 


,noi:'ar:  i<  aneid  l  i  *0  ni  :-:-3lIcjqa  j  ..  :  *te 


(d  navl§  no  dudlrd*  >  ax  nr»^  »m  isblanoo 


t  ( - )  f  '  *i  «  •  '  >0  bnt'  c  l  < 


V'^*  ,;ol-)  X  (SS  I  .  x. 


0  J 


,(0S.£.S}  in  ;mr<  i  •rf.l  ,  I  <i  fl  .  >  *  >  0  3*«dw 


i  * t  &  a  '  3  id  -la^fcj  rfaldw  -  A  noi’Jaaul  add  33018 


,  9  . 
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between  hypothesis  (2.3.12)  or  (2.3*13)  stated  in  terms  of  6  and 
the  one  stated  in  terms  of  A  .  If  the  original  loss  function  which  is 
given  as  a  function  of  6  can  be  written  in  the  form  (2.3.2)  as  a 
function  of  A  and  if  the  a  priori  distribution  G  (0)  corresponds  to 
the  a  priori  distribution  G(^)  ,  then  the  empirical  Bayes  rule  given 
by  (2.2.22)  with  A^(x)  given  by  (2.3.22)  will  be  ’’optimal  in  the 
limit . " 


2.4.  General  Methods  for  Obtaining  Consistent  Estimators  in  the  Two 
Decision  Problem. 

From  (2„3.3)  It  Is  seen  that 


(2.4.1) 


Ag(x)  =  r  f^(x)  d  G(\) 

■  ‘V>  •  L  *  "  oW 


using  loss  function  (2.3.1l)>  and 


(2.4.2)  Ag(x)  =  f  [A2  -  (\-X*)2]  £%(x)  d  G(\) 

=  (A2->*2)  fG(x)  -  [  \2  fx(x)dG(X).+  2)i^£x(x)dG(M 
using  loss  function  (2.3.14)  .  Thus  if  1  d  G (7\ )  and  /  >P d  G(%) 

-la  Ja 


are  finite, 


. 

navis  slui  laoiilqnrs  dri3  nsrii 

»*>  nl  I  bum  jqo1'  ,d  IXiw  (SSI.{+?)  yd  navis  (*)  A  rfilw  (SS.S.S)  yd 


^  X 


(/C)Ob(x)  •■:  f.  Y*S  +  (/)Ob(x)  .l  My  -  <*)„!  (  ’  Va)  -. 
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(2.4.3) 


E  (A  |  x )  = 


j'  ^f^(x)  d  G(^0 


w 


(2.4.4) 


E(A2|x)  =  JL 


A' f^(x)  d  G(A) 


fG(x) 


and  if  E  (A|x)  and  E  (A  |x)  are  such  that 
n  n 


(2.4.5) 


En(A|x) — *  E  (A  |  x ) 


(2.4.6) 


En(A2|x)-^  E(A£|x)  , 


we  have  from  Corollary  2  that  equivalent  sequential  decision  procedures 
to  (2.2.22)  for  testing  the  hypotheses  of  the  type  (2.5-12)  and  (2.3.13)* 
with  loss  functions  (2.3.11)  and  (2.3-14)  respectively,  are  given  by 


(2. U. 7) 


and 


(2.4.8) 


r  do  ’  if  En(Alx)  - A* 

0=5  * 

^  di  ,  if  En(A|x)  >  ^ 

i 

r  d  ,  if  E  (A2  lx)  -  2A*E  (A  lx)  <  A2  -  A* 
JO  n  1  ns  1  - 

4  d.  ,  if  E  (A2 lx)  -  2A*E  (Alx)  >  A2  -A* 
1  nv  1  /  n  1 


Rutherford  and  Krutchkoff  [16]  give  two  general  methods  for  obtaining 

2 

the  needed  consistent  estimators  for  E(A|x)  and  E  (A  |x)  . 


'  ■ . 


.  .  . 


JBflJ  ftoufc  9X»i  (<|^  X  '  ri”  1 

n 


(*!  KB  — (x|a)  a 


■ 

Vd  r. 9v la  »i«  E.s)  bn*  (U.^.S)  »«ol 

K  >  (*|A)  3  11  .  Qb 
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A.  General  Method  _I . 

Define  class  of  families  of  conditional  distributions 

by  the  following  conditions: 

(a)  In  the  family  (F^(x);^  e  ft)  ,  each  F^(x)  has  a  density  with 
respect  to  Lebesgue  or  counting  measure. 

(b)  The  density  functions  f^(x)  are  such  that  whenever  f,  (x)  >  0  > 


(2.U.9) 


rfey  Dx  Vx)  -  Qi(x’x)  ’  1  -  °’1’2  ’ 


where  Q  (x,^)  is  a  polynomial  in  "K  of  degree  i  and  D1  represents 

1  X 

the  i'th  partial  derivative  if  the  measure  is  Lebesgue,  or  the  i'th 
finite  difference  if  the  measure  is  a  counting  measure. 

For  each  member  of  any  family  in  one  can  always  find 

functions  ajj(x)  such  that 


(2.4.10) 


= 


aij(x)  . 


i=0 


Hence  we  may  write 


E(A^|x)  = 


^  f-v(x)  d  G(A) 


ft 


Vx) 


y  aij(x)  Qi(x»  Axj- d  g(,') 
1=0  G 


•  ;  «  ■-  .  '  >  i;  '  i  ’ 

* 

«"  « 

'*  ■>  S:  \' 

ft  -  j  •  >  ilznvh  sdT 


.  ■  Vi 

'■  '  i  *' 

?  >  ‘ 


■  • ; 


b. 


(01. 4. S) 


'  -  ■ 
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(2  A. 11) 


4  «..(*) 


1  =  1. 


_/fi  Vx’X)  £%(x)  d  G(X> 


a.  .  (x) 

\  XJ 

i=0 


ft 


fA(x)  d  G(?s) 


r — i  3  .  .  (  X  ) 

I-frrr 

i=0  G 


D 


x 


J  f^(x)  d  G(X) 

^"2 


I  aij(x) 

i=0 


<J±± 

fG(x) 


assuming  that  the  order  of  differentiation  and  integration  can  be  inter¬ 
changed.  Thus  we  require  consistent  estimators  of  f  (x)  ,  f  (x)  , 

G  x  G 

2 

and  D  f  (x)  .  Consistent  estimators  for  f  (x)  are  given  by  (2.3.7) 
x  G  G 

and  (2„3„l8)  for  the  discrete  and  continuous  cases  respectively.  Also, 
Rutherford  has  shown  that 


(2  A. 12) 


f  (x+l)  -  f  (x)  ,  for  X  discrete 


n 


f  (x-!-h  )  -  f  (x) 
n  n  n 


,  for  X  continuous 


n 


where  h  =  d  n”^^  ,  d  >  0  is  some  constant,  and 
n 


(2.4.13) 


f[2j(x)  = 

n 


f  (x+2)  -  2f  (x+l)  +  f  (x)  ,  for  X  discrete 


f  (x+2h  )  -  2f  (x+h  )  +  f  (x) 
nx  n  nv  n  nx 


,  for  X  continuous 


n 


(*)»  t  U)*  $  J  S.f  '  * 


,  II  b  &  \r  o  **  10  an-  ,ar'^  8a-'  BUP86 

,y  >  cl:.v  .>1  •  9*  0  b  *oim )  ;u)  bi  io-  b  s>.'t  T)1  v-  £.c«S'  ba a 
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1  2 

are  consistent  estimators  for  D  f  (x)  and  D  f  (x)  respectively. 

x  G  x  G 

Thus 


j 

(2.4.14)  En(Aj | x )  = 

i=0 

are  consistent  estimators  for  E(A|x) 
the  ratio  of  consistent  estimators  is 
ratio . 


a.  .(x)  --~rv  ,  j  =  1,2  , 

ij  f  (x)  J 

j  nv  ' 

2 

and  E  (A  |x)  respectively  since 
a  consistent  estimator  of  the 


As  an  example  of  this  method  consider  the  important  case  where 

X  has  a  normal  distribution  with  unknown  mean  A  and  known  variance 

2 

O'  so  that 


(2.4.15) 


f\(x) 


exp 


-  00  <  X  <  00 


where  cr  >  0  and  -  oo  <  A  <  oo  .  The  family  of  distributions  corresponding 
to  this  density  function  is  in  the  class  £7^  since  condition  (a)  is 
satisfied  and  condition  (b)  is  satisfied  since  for  i  =  0  , 


D°f,(x)  =  1  =  Q  (x, A)  , 
f ,  ( x )  x  A  0 


for  i  =  1  , 


df^(x) 


j=r  exp 


L  2  2c- 

N/2jto- 


■(x^) 

-  20-2 


2-i 


_  f  (x) 

2  A1  J  ’ 


giving 
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and  for  i  = 


giving 


Then  from  (2 . 


implies  that 


implies  that 
(2.4.14)  we 
are  given  by 

(2.4.16) 


f\(x)  _2  *  _2  -  - 


O'  O’ 


2  , 


a  y*)  1 


6x" 


2  fA(x)  + 


v*-x) 


4  A 


f\(x)  > 


V*1 


62f^(x) 

6x2 


2 

x 


2Ax  A 

~r  +  T 

a  O' 


\  =  Q 2 

cr 


4.10)  , 


A  = 


ail (x)  Qi(x,A) 


i=0 


(x)  ( 1 )  +  (cr2)  f  -  ~ 

\  c  a 


a^(x)  =  x  and  a^(x)  =  7  and 


A2  =  ^  a ^ (x)  Qi(x, A) 

i=0 


(  2  2 
=  (x  -K7 


w.  \  /-  2v  x\  /  /x^  2Ax 

)(1)  +  (2x,j  )  (—  -  -g]  +  (a  )  -  —  +  —  - 

v\0'  c  /  vo"  a  cr 


2  2  2  4 

a  02  =  x  +  <J  ,  a12  =  2xa  ,  and  a 2g  =  a  .  From 

2 

see  that  the  consistent  estimators  for  E(A|x)  and  E  (A  |x) 


cr2  f^(x) 


E  (A  lx)  =  X  +  -  ny 

n v  1  fn(x) 


X 
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and 


(2.4.17) 


f,2n  2  2  „  2  fn11(x)  4  fi2l(x) 

n  1  f  (x)  f  (x) 


n 


n 


respectively. 


Other  examples  of  the  use  of  general  Method  I  are? 

(l)  The  gamma  distribution  with  scale  parameter  A  and  density 


(2.4.18) 


f*(x)  - 


1  ,r  r-1  -Ax 

Ax  e  ,  0  <  x  <  00 


VyrJ 


0 


,  otherwise 


where  A  >  0  and  r  >  0  are  known.  The  consistent  estimators  are 


(2.4.19) 


En(A|x) 


r-1 

x 


f[l](x) 

n  v  7 

f  (x) 
n 


and 


(2.4.20) 


/  1  \  2(r-l)  f^(x)  f^(x) 

En(A  lx>  ■  - TT^T) —  +  ~T^T 


(2)  The  Poisson  distribution  with  mean  A  and  density 


(2.4.21) 


-A  ,  x 

f A (x)  =  “  »  x  =  0,1,2,...  , 


xl 


where  A  >  0  .  The  consistent  estimators  are 


(2.4.22) 


[1] 

f  (x) 


En(A|x)  =  (*+i)  (i^rr  + 1 


nrr  ,  +  ijyr  wa*  9+  ■-<«!*).» 


(Tl.ti.s) 


. ylsvl ioaqaai 


lc<jB  I  bo/iiaM  laxanHS}  io  da u  dito  io  «9jqi  t»xa  i*t  30 


X^i8nsb  bna  A  loia^aiaq  alaaa  riJJtw  flol3iKfli3eJtb  a0j;a#  orfT  (I) 


1-7 

•  ''T5WI.W,, 


(  '  ^  .8) 


£  -  a *<  •  n  <■'  fo  )  •*',  l  .i  t»  0  <  i  ‘jo#  0  A  ®7»/V 


( l  f  !  * 

n  I-  ,  ,  A 


~T  -  H  A>«> 


Y*lajft<b  baA  A  n&am  ifcflv  noi 3  udlTJ  nib  aoaaJto?  aiff  (S) 


*  -  T. - 
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and 


(2.4.23) 


f[l](x) 


En(A2|x)  -  (x+l)(x+2)  -S-  +  2  -2-^j 

\  n  n 


+ 


(3)  T’ae  geometric  distribution  with  parameter  \  and  density 

(2.4.24)  f^(x)  =  ^(l-^)X’1  ,  x  =  1,2,...  , 


where  0  <  <  1  .  The  consistent  estimators  are 


(2.4.25) 


and 


(2.4.26) 


o 


(4)  The  negative  binomial  distribution  with  density 


(2.4.27) 


-  r*-1)  d-^)r  *x 


x  =  0,1,2,. 


where  0  <  A  <  1  and  r  is  a  fixed  integer.  The  consistent  estimators 

are 


(2.4.28) 


E  (A  I  x ) 
n  1 


1 


) 


and 


'  \ 
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(2.4.29)  E  (Ag|x)  =  .(*.+l)(*+g) 

n  1  (r+x+1) (r+x) 


f[2](x)  2fi1](x) 


n 


f  (x)  +  f  (x) 

n'  '  n'  ' 


n 


+  1 


(5)  The  logarithmic  distribution  with  density 


(2.4.30) 


r  t  \ _ -1 _  A 

=  log  (l-?0  x”’  X=1’2’"'  » 


where  0  <  \  <  1  .  The  consistent  estimators  are 


(2.4.31) 


E  (Alx)  =  2±i 
n  1  '  x 


f[1](x) 

f  (x) 

n 


+  1 


and 


(2.4.32) 


E  (A2  lx) 
n  1 


x+2 

x 


f[2](x) 

n  v  J 

f  (x) 
n'  ' 


2f-1^(x) 
n  x  ' 

+  f  (x).  + 


n 


Bo  General  Method  II . 

Assume  that  X  is  a  vector  of  k  >  3  independent  random 
variables  each  with  the  distribution  function  F^(x)  •  A  family  will 
be  in  the  class  ^7^  if 

(c)  there  is  a  statistic  T  sufficient  for  X  ,  and 

(d)  the  distribution  of  T  has  a  density  with  respect  to  either  Lebesgue 
or  a  counting  measure  of  the  form 


cP(k)(|-)k  h(t,A) 


1 


for  all  t  in  some  interval 


(2.4.33) 


f^(t,k)  = 


0 


otherwise 


. 


[  ■  -  •  -  ■  frfr  |  (£>ifefe  -  W‘a)„.  <«■«) 


. 


. 


>  -  {**)# 
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where  cp(k)  is  a  function  of  k  only  and  h(t,A)  is  independent  of 
k  .  Considering  the  first  v  random  variables  in  the  vector  only,  the 
corresponding  sufficient  statistic  has  density 

r  cp  ( v )  ^  h(t,7s)  ,  for  all  t  in  some  interval 

(2.4. 34)  fx(t,v)=J 

v-  0  ,  otherwise 


Then  since 


(2.4.35) 


and  if 


Xjf^(t,k)  =  tp(k)(|)k  h(t,?\) 


£x(t,k)  d  G(4)  =  fG(t,k) 


t 


(2.4.36) 


E  (A^  1 1 ) 


f-v(t,k) 

fG(t,k) 


d  Gft) 


f^(t,k-j)  d  G(?s) 


tj  y(k)  fG(t’k'j) 
«P(k-j)  0  fG(t,k) 


Letting  T  .  represent  the  sufficient  statistic  based  on 


(rf)?  eiarfw 


•  .  .  ,rf .  ecro.'s  .  ;• 


sliJaiaoif:  saaiolllue  gnJtbaoqeanod 


Uj0  b  Mr4\,i’1  *•  *$12 


. 


the  first  v  recorded  random  variables  in  the  i'th  replication  of  the 
problem,  and  defining  from  (2.3.7)  and  (2.3.18)  , 


(2. It. 37)  fn(t,k-j) 

for  T  discrete  , 

F  (T.  .  -  ,  T.  .  T  .  f  t+h  )  -F  (T  .  1  ,  T.  .  0, .  .  . ,  T  .  *t  -h  ) 

n  k-.i ,  1  k-j  ,2  k-j,n*  n'  nK  k-j,l*  k-j,2'  k-j,n*  n 


fn^Tk- j , l’^k- j ,2' *  *  *,Tk-j,nJt)  ’ 


2h 


n 


for  T  continuous  , 


it  follows  that  consistent  estimators  for  E(A|t)  and  E  (A  |t)  are 


(2.4.38) 


„  ,J,  .  tJ  ip 00  fn(t’k'j) 

En(A  lE)  -  tp(k-j)  ‘  fn(t,k)  ’ 


j  =  1,2 


respectively.  For  a  sufficient  statistic  T  ,  E(A|t)  and  E  (A  |t) 

2 

are  equivalent  to  E(Ajx)  and  E (A  |x)  respectively. 


As  an  example  of  general  Method  II,  consider  the  case  where 

X  has  an  exponential  distribution  with  parameter  ^  .  Assume  that  we 

have  observations  on  k  independent  and  identically  distributed  random 

variables,  each  having  the  conditional  density  function 

x 


(2.4.39) 


f\(») 


0  <  X  <  00 

otherwise 


where  \  >  0  . 

A  statistic  T  will  be  sufficient  for  if 


■ 


s«t-<  ‘MJ  n  * 1 


t 


- 


(  f|  .'3  (*|A)a  ,  T  diia!3B38  inaloilita  ft  »c'  .Y  »v  . -:  *1^ 


»v;  v.  jo  , 


0  J 
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(2.4.40) 


L (x^, x^, . . • *xv  |^)  -  8 ( t )k (x^ , x^ t  « . • f x^ ) 


where  L(x^,x^, . . . ,x^ |^)  is  the  likelihood  for  the  sample  values 

A 

x  ,V...,x  given  ,  g ( t  | ?S )  is  the  conditional  density  function 
for  T  ,  and  k(x^,x^,  . . .  ,x^)  is  independent  of  "K  .  Considering  only 
the  first  v  independent  observations,  let 


t  = 


v 


i=l 


x 


We  now  derive  the  conditional  density  function  for  T  .  Now 


V 


f^  (Xx  ,  Xg  ,  a  .  o  ,  X^  ) 


n  w 

i=i 


"v  eXP 

Xv 


1 

\ 


X  . 


i=l 


where  min  (x, ,x  ...,,x  )  >  0  .  If  we  use  the  transformation 

12  v 


U1  =  X1  +  X2  +  •••  +  Xv’  U2  =  V-”’  uv  =  \  ’ 


then 


X  „  ~  U  ”  U  "  ...  “  U  ,  X  —  U  X  —  u  , 

1  1  2  v  2  2  V  V 


and  the  Jacobian  of  the  transformation  is 


’ 

i  »y  ?•  »•  9ti  •  t  be  '  J  ^  ,-X;  1 

nolJ^fiJ^  ySianab  laroi^ibrioo  9fto  t  J 

y J  ■  >  .  jtioblaaoO  .  A  o  In?  ;*•  :  -1 

■Jt,I  /3  '  «do  indbii^q&x  t  v 


JTk 
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6(xx,x2, 

§Tvv 


•  > 


X 

V 


) 

7 


dx^ 

^Xx 

Su2 

■  ‘  '  5uv 

1 

-1 

“"l-  o  •  • 

-1 

-1 

dx2 

dXg 

Su2 

dx2 

"  •  3uv 

0 

1 

0  ... 

0 

0 

• 

• 

• 

= 

0 

0 

X  •  o  • 

0 

0 

• 

• 

0 

» 

V 

• 

• 

e 

9 

• 

• 

• 

• 

• 

0 

0 

dx 

dx 

6x 

• 

• 

• 

o 

0 

V 

Sui 

V 

Su2 

V 

8  •  •  ^r 

V 

0 

0 

0  o  •  • 

0 

1 

1 


Then 


6 (X1 y X2 f  • • • f Xy ) 

VW  •••»%)  =  *  •  ,,UV^  *  •  ‘'^V  *  *  9,UV®  d(uru2,  .  .  .,Uv) 

1 

1 


-X  U1 


,  if  u , -u  - .  .  . -u  >0,  uo>0,  ....  u  >0. 

12  v  2  v 


and 


W  ■ 


Ul"U2' 


-u 


V  - 1 


w 


-u 


v-2 


du2du3 


1 


if  ux  >  0 
if  u  <  0 


Thus 
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1 

'A 


L(x1>x2,...,xv|A)  =  — 


X 


i=l 


_^L 

1  v  ~  1  A 
u.  e 


r(v)  A 


v  1 


u_i 

1  "  A 

—  e 

Av 


iM 

V  - 1 
u. 


=  fx^ui)  k(x1^ x2» • • ->xv) 

and  by  (2.4.40),  is  sufficient  for  A  since 


k (x^ > X2 > * ° * f Xy )  - 


Tv) 


v  - 1 


X 


i=l 


is  independent  of  A  .  Therefore 


T  = 


X, 


i=l 


is  a  sufficient  statistic  for  A  and  has  conditional  density  function 


(2.4.41) 


1  / 1  \  V  /  A  \  A  r\  ^  ^ 

r  x'rYv)  W  y  e  »  o  <  t  < 

fx(t,v)  =  4  Ar(v)  A  * 

L  0 


00 


otherwise 


Comparing  (2.4.41)  and  (2.4.34)  we  see  that  cp (k )  =  so  that  from 

n 

(2.4.38)  the  consistent  estimates  of  E(A|x)  and  E {hr  | x )  are 


(2. It. 42)  En(A|x)  =  En(A|t)  = 


t 

Xr(¥J 

1 

Ar(k-i) 


f  (t,k-l) 

f  (t,k) 
nx  7 


t  V^-1) 

k-i  fn(t,k) 


and 


(2.4.43) 


P  2  t2  £  (<=A-2) 

En(A  |X)  =  En(A  |t)  =  (k-!) (k-2)  f  (t,k) 


respectively. 


I 

(  ,x.  r)>  i.  ,  - 


.  ; 


- 


.  .  • 


/  '  1  ^  \ 

7'  7. 


,a  j  irf:  ox  ,  ;,7  -  U)<f  «*  »•*  -  (*•*•»> 


~T  i-  i  -  ' 


.  [ sv  t  JO  q*-»1 
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Other  examples  of  the  use  of  general  Method  II  are: 

(l)  The  normal  distribution  with  known  mean  jj.  ,  unknown  variance  A  , 
and  with  conditional  density  function 


(2.4.44) 


Mx)  = 


si  2jtA 


exp 


2A 


2  -i 


-00  <  X  <  00 


where  A  >  0  .  If  we  consider  the  first  2v  of  the  observations  on  2k 
independent  and  identically  distributed  random  variables  each  with 
conditional  density  function  (2.4.44),  then 

2v 

i  =  y  (Xi-  n)2 

is  a  sufficient  statistic  for  A  with  conditional  density 


txV  2A 


(2.4.45) 


2  r(v) 


o 


V. 


The  consistent  estimators  are 


(2.4.46) 


f  (t,k-l) 


En(Alx)  "  2^2  rfel] 

nN  7 


,  0  <  t  <  00  . 

,  otherwise  . 


and 


(2.4.47) 


_  ,.2,  ,  t2  fn(t>k-2) 

En^A  lx^  -  (2k-2)(2k-4)  f  (t,k) 


(2)  The  uniform  distribution  with  one  known  terminal  point,  range  A  , 
and  with  conditional  density  function 

Y  ,  0  <  x  <  A  <  00  . 


(2.4.48) 


f*w  =  { 


0  ,  otherwise 


(“lensb  I  oi  103  -!Jjv  f-  '**  3  Sailed*  IBsiaiMu.  >  a l 
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If  we  consider  the  first  v  of  the  observations  on  k  independent  and 
identically  distributed  random  variables  each  with  conditional  density 
function  (2.4.48),  then  T  =  max  [ , X^, . . . , X^]  is  a  sufficient  statistic 
for  A  with  conditional  density  given  by 


(2.4.49) 


f\(t,v)  = 


-  (-)V 
t  V 


0 


0  <  t  <  A  <  o° 

otherwise 


The  consistent  estimators  are 


(2*4*50) 


En(A|x) 


f  (t,k-l) 
tk  n '  7 

k-1  f  (t,k) 
nx  7  ' 


and 

(2.4.51) 


E  (A" 
nv 


2,  f  (t,k-2) 
,v  t  k  n v  7  7 

'Xj  =  k-2  f  ( t ,  k ) 


n 


2o5°  Point  Estimation,. 

Using  the  loss  function  (2„3.1l),  suppose  that  instead  of 
testing  the  hypothesis  (2.3012)  we  wish  to  estimate  the  unknown  parameter 
A  .  If  J^)  =  (d)  is  the  set  of  allowed  estimates  of  A  ,  we  have 
proved  in  Theorem  1  that  if  8(x)  is  any  estimator  of  A  and  if 

(2.5.1)  r  l(a)  a  g(?o  <  00 

JQ 

and 

P 

(2.5.2)  cpG(5n(Xn;x),x)— >  cpG(5(x),x)  a.e.  (^)x  , 
then 

lim  E  L(6n(Xn;X),A)  =  E  L(B(X),A) 

n  -» 00 

where  8(x)  is  a  Bayes  estimator.  That  is,  the  conditions  (2.5.1)  and 


■ 
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(2.5.2)  imply  that  b^X^x)  ,  n  =  1,2,...  ,  is  an  asymptotically 
optimal  sequence  of  estimators. 

Considering  the  special  case  of  a  loss  function  which  is 
continuous  in  "dn  ,  Rutherford  and  Krutchkoff  [16]  give  the  following 
simplification  of  Robbing*  result. 


Lemma  1.  If 


(2.5.3)  L(d^)  is  continuous  in  d  for  each  A  6  , 


(2.5.1*) 


L(A)  d  G(A)  <  00 


1 


and 

(2.5.5)  ^n^— n*X^  — """*  a^e.  ([i)x  , 

where  5(x)  ijs  a  Bayes  estimator,  then  8  (X^;x)  ,  n  =  1,2,  .  „  .  ,  ^s 
an  asymptotically  optimal  sequence  of  estimators . 


Proof.  From  (2.5.4)  , 


f  L(A)  d  G(A)  =  f  L(A)  f  f\  (x)d(i(x) 

J  si  J  n  L 


d  G(X) 


=  f  J  L(A)  f^(x)dn(x)  d  G (A)  < 

«•  X 


00 


implies  that  L(A)  f^(x)  is  integrable.  Since  L(d,A)  f^(x)  is  bounded 
uniformly  by  L(A)  f^(x),  then  using  a  result  of  Cramer  (p.  57  of  [1]), 


' 


i  ,  J  v.  Ij  li  >3  b  ft 
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if  L(d,A)  is  continuous  in  d  for  each  A  ,  then 

cpG(d,x)  =  f  L (d, A)  f^(x)  d  G(A) 

is  continuous  in  d  .  Then  (2.5.5)  implies  (2.5*2)  and  therefore 
&n(Xn,'x)  is  an  asymptotically  optimal  sequence  of  estimators. 

For  the  usual  squared  error  loss  function  which  is  continuous, 
let  w(x)  be  an  estimate  of  A  .  The  loss  function  when  x  is  observed 
is 


(2.5.6) 


(oj(x)  -  A)  • 


Then,  in  the  discrete  case,  the  expected  squared  deviation  is 


(2.5.7) 


E(w(x)  -  A)‘ 


a2 


=  E(E[  (w(x)  -  A)  | A.  =  A] } 


=  E 


{  X  fA^x^  } 


x 


=  J  ^  f^(x)(w(x)  -  A)2  d  G 
^  x 


PO 


=  minimum 


when  we  define  oo(x)  for  each  value  of  x  as  that  value  y  =  y(x) 


for  which 


(2.5*8) 


I (x )  =  J'  f^(x)(y-A)2  d  G(A) 
£2 


=  minimum 


i  t 
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For  any  fixed 


(2.5.9) 


and  therefore 


when 


x 


l(x)  =  I*  fx(x)  (y2-2yX+^)  d  G(X) 

=  y 2  [  d  G(x)  -  2y  r  ^f-\(x)  d  g(x) 

J  ft  J  a 


+ 


J'  X2f^(x)  d  G(X) 

£2 


=  r  f^(x)  a  g(%) 

ft 


y  - 


a 


*f*(x)  d  G(X) 


f^(x)  d  G(X) 


ft 


+  J  X  f^(x)  d  G(X) 
<£"2 


C X  *fx^  d 

r  fx(x)  d  G (X ) 

J  .0 


r  Xf  (x)  d  g(x) 

/  fx(x)  d  G(X) 

-'ft 

y  - 

'  ft 

L  f  f*w  d  G(^) 

^"2 

=  0 


y  = 


T  Xf,  (x)  d  G(X) 

-/ft.  _ 

r  fx(x)  d  g(^) 


and  l(x)  will  be  a  minimum.  Thus  the  Bayes  estimator  of  X  with 
respect  to  an  a  priori  distribution  G  of  X  is 


-  $  - 


,  x  b*xl*  to? 

* 


j  )  b  (x;  m 


(<)3»  / 


.- -  ■  -  V  (  :  ■  '  " 

Q 

t 


- - -  -  -  v]  (/■  )  >  (x 


x  .  .  —  ■ 


•  ■m  <«•  *  ** 


. •{  a  9c  II  iw  (  X - 


A  )  ;  3l:  *  b  t<3  ft  is  oi  *i>«<  ■ 
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r  *f^(x)  d  G(^) 

(2.5.10)  uG(x)  =  — -  =  E  (A  jx) 

f  fx(x)  d  G(\) 

Vi 


It  follows  from  Lemma  1  that  if 


L(70  d  G(X)  <  00 


then  any  consistent  sequence  of  estimators  En(Ajx)  for  e(A|x)  is 
asymptotically  optimal.  Such  estimators  can  then  be  determined  by 
general  Methods  I  and  II  in  section  2.4  for  a  wide  class  of  distributions. 


When  the  set  of  allowed  estimates 


£  =  (d) 


is  bounded,  then 


(2.5.11) 


L(X)  . 


sup 

d 


(d  -  *) 


2 


is  bounded  for  each  \  e  £1  .  This  implies  that 


L(?s)  d  G(X)  <  00 


and  hence  a  sufficient  condition  to  ensure  that  En(A|x)  ,  n  =  1,2,...  , 
is  an  asymptotically  optimal  sequence  of  estimators  is  that 


En(A|x) 


E(A|x) 


a.e. (n)x 


If  the  set  of  allowed  estimates 


JD  =  fd) 


is  unbounded, 


(2.5.12) 


.  V  ■  0  I  =  *  — - 

■ 

,  <*  >  (X  '  b  (/C)j  \ 

' 

[3  ,  fc  bn  <od  el  [  b]  t  Oo  a  .It  ri 3 a9  bewol  ia  10  J?->a  a  r,  >i  1 


' 


43  if*  "  U  « 
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is  not  finite  and  therefore  the  result  is  not  true.  However  when  A 
has  a  bounded  fourth  moment  in  its  a  priori  distribution  G  ,  it  will 
be  shown  that  a  consistent  sequence  of  estimators  for  E(Ajx)  will 
allow  us  to  find  an  asymptotically  optimal  sequence  of  estimators  in  a 
squared  error  loss,  point  estimation  problem. 


Let 


(2.5.13) 


E(A^)  =  f  d  G(A)  <  B  < 
-'ft 


00 


and  define  for  any  e  >  0  and  each  x  , 


(2.5.14)  EE,€(A|x)  = 


BV4  b^/4 

:  -  --■■■  ,  when  E(A|x)  >  — - — 

3/4  3/4 

E(A|x)  ,  when  — -  <  E(A|x)  <  — 


.B3/4  _b3/4 

— -  ,  when  E(A|x)  <  — - — 


and 


r  B3/4  b3A 

— —  ,  when  En(A|x)  >  — — 

3/4 

(2.5.15)  EnB,€(A|x)  =  ^  En(A|x)  ,  when  <En(A|x)< 

_B3/4  -B3/4 

— — - —  ,  when  En(A|x)  <  — — — 


B 


3/4 


We  have  then 


,  D  i o i  ! ; r :  -  J r-  rti  Jcwon  rl3  :uoi  bsbr*/od  a  aari 

B  /  >  210  I B  i3B9  Jto  •  I«  f>S2  1  &  i  yXI  tO^OTYBB  BS  bflJH  oi  BU  VjSoIiB 

’  •  .  ■  >  ,  'I  1119  '--.pe 


Af„ 


a  . 


<  (*|A)rl  odw  ,  ,  '<  f  g  ■■ 


-  >  (xjA)  3  osriv; 


,  (x|A  S 


nsifa  sv»d  *W 
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Theorem  2 .  If 


(2.5.16) 


E(A1+)  <  B  < 


00 


and 


(2.5.17) 


En(A|x) 


E  (A  Jx) 


a  0e  „  (p.)x  , 


then  for  any  €  >  0  , 


(2.5.18) 


lim  E[EnB,€(A|x)  -  A]2  <  R  +  e 

n  00 


where 


R  =  E[E(A|X)  -  A]2 


Proof .  From  (2.5»17)>  we  see  that 

(2.5.19)  EnB’e(A|x)-E-*  eB<€(A|x) 

for  any  €  >  0  .  Suppose  that  our  set  of  allowed  estimates  j£)  =  (d) 
is  truncated  at  -B  and  B  .  This  gives  the  bounded  case  and,  using 
Lemma  1,  condition  (2. 5*1-9)  ensures  that  E^  ’  (A|x)  ,  n  =  1,2,...  , 
is  an  asymptotically  optimal  sequence  of  estimators.  Therefore 

(2.5.20)  lim  E[EnB,e(A|X)  -A]2 

n  00 

=  E[EB,€(A|X)  -  A]2  * 


But 
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(2.5*21) 

E[EB'€(A|X)  -  A]2 

=  E[EB'€(A|x)  -  E (A |x)  +  E (A |X)  -  A]2 
=  E[EB'€(A|X)  -  E(A|X)]2 
+  2E[  (EB'  €(A|X)  -  E(A|X))(E(A|X)  -  A)] 

+  E[E(A|X)  -  A]2  . 

The  second  term  of  (2,5.21)  is  zero  since 


(2.5*22) 

E[(EB’e(A|x)  ,  E(A|Xj)(E(A|x),  -  A)] 

=  e(e[(eb’€(a|x)  -  E(A|X))(E(A|X)  -  A) |x] ) 

=  E{[EB’e(A|X)  -  E(A|X)][E(A|X)  -E(A|x)]} 

=  0  , 

and  the  first  term  of  (2.5*21)  becomes 


(2.5.23) 

e[eb'€(a|x)  -  E(A|X)]2 

=  J  [EB,€(A|x)  -  E(A|x)]2  d  Fg(x) 

<  J  [ E ( A | x ) ] 2  d  Fg(x) 

where 


and  Fq(x) 

b3A 

p  =  (x  :  |e(A|x)  1  >  ) 

is  the  marginal  distribution  of  X  .  Applying  the  Schwarz 

and  Holder  inequalities  successively  to  (2. 5«2j5)  we  get 
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E[EB,£(A|X)  -  E(A|X)]‘ 


(2.5.24) 


<  J  J  [E  (A  |x)  d  Fg(x)  J  d  Fg(x) 

<  J  /  E(A4|x)  d  Fg(x)  f  d  Fg(x) 


Now  by  (2.5.16)  , 


f&  E(A4|x)  d  Fq(x)  <  J'  E(AU|x)  d  Fq(x) 


(2.5.25) 


=  E(A4) 


<  B  <  00 


and 


(2.5.26)  /  d  F  (x)  =  P[  |E(A|X)|>— —  ] 

A 

,  Var  FE(A  |X) ]  e2 

-  y* 


by  Tchebychev's  inequality.  Also,  using  the  Schwarz  inequality,  we  have 
(2.5.27)  Var  [E(A|X)  ]  <  Var  (A)  <e(A2)  <  Je  (A2)  =  sT& 


and  so 

(2.5.28) 


Fg(x) 


< 


-  .  i  «i  ••  "j  '  orto 

£  JA  >  (  *A)2  V  (A)  767  »  [(X|.-.)-:  T»  (YS.5.8) 
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Substituting  (2.5*25)  and  (2.5*28)  in  (2.5*24),  we  see  that 

(2.5.29)  E[EB,€(A|X)  -  E(A.jx)]2  <  e  . 
Therefore  we  see  from  (2.5*21)  that 

(2.5.30)  lim  E[EnB,€(A|X)  -  A]2  <  R  +  c  . 

n  ->  00 


i  i  rv  t(-;iS.^.S)  r  :  .v  |  (  *  .  )  n.ra. 

5  >  [(x;A)3  -  (Xj A'  1  i  3 


.  j»  +  h  >  “[A  -  (x  a)j'  a:]  i  ti 


-) 


i 
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CHAPTER  III 

ESTIMATING  THE  PRIOR  DISTRIBUTION 

3*1.  Robbins  General  Method. 

Returning  to  Corollary  2,  recall  that  for  jQ  =  (d0'dl)  and 
£2  =  [-00  <  A  <  00]  f  an  asymptotically  optimal  decision  procedure  D 
exists  relative  to  the  class 


0^  =  [G  :  f  L(d.,A)d  G(A)  <  00  ,  i  =  0,1} 

•'G  1 

whenever  we  can  find  a  sequence  A  (x)  =  A  (X, ,X^,...,X  :x)  such  that 

n  nv  '  n'  1’  2  n7  ' 

for  a.e.(ji)x  , 

(3.1.1)  An(x)— £  ag(x)  =  f  [L(d1,A)-L(d(},A)]  fA(x)d  G(A) 

for  every  G  €^/  .  We  wish  to  construct  a  sequence  A  (x)  by  finding 
a  sequence  G^(A)  =  G^(X^,X?,  .  .  .  ,Xn,*A)  of  random  distribution  functions 
in  A  such  that 

(3.1.2)  P[  1  im  G  (A)  =  G ( A )  at  every  continuity  point  A  of  G]  =  1 

n  — >  00 

Setting 


(3.1.3) 


[L(d1,A)-L(dQ>A)]  f^(x)  d  Gn(A) 


f 


'  'v 


AU  ■  \  I  _ 


' 
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and  if  the  function 


(3.1.4) 


(L(d1,A)-L(d0,A)]  f^(x) 


is  continuous  in  A  for  a„e.  (|i)  fixed  x  ,  then  by  the  Helly-Bray 
theorem  (p.  l80  of  [ 9 ] ) ,  (j.l.l)  holds. 

Robbins  [14]  gives  the  following  method  for  constructing  a 
particular  sequence  G^(A)  of  random  estimators  of  unknown  G(A)  . 

Assume  that  for  every  A  €  £2  =  (-°°  <  A  <  oo]  }  F^(x)  is  a  specified 
distribution  function  in  x  ,  and  for  every  fixed  x  €  x=  {-oo  <  x  <  oo j  f 
F^(x)  is  a  Borel  measurable  function  of  A  .  Define  for  any  distribution 
G  of  A  a  distribution  function  in  x  given  by 


(3.1.5) 


p  00 

Fq(x)  =  /  F^(x)  d  G(A) 

J  -00 


Letting  be  a  sequence  of  independent  random  variables  with 

common  distribution  function  F_  ,  define 

G  7 

(3.1.6)  B^(x)  =  B  .  .,Xn;x)  =  ~  (number  of  terms  X^ 

i  =  1 , 2 , . . . , n  wh ich  are  <  x  )  . 

The  distance  between  any  two  distribution  functions  F^,  F^  is  defined 
to  be 

(5.1.7)  p(Fx,F?)  =  sup  |Fx(x)  -  F2(x)|  . 

X 

Let  be  any  class  of  distribution  functions  of  A  such  that  G  e  $  . 


i  .  -  i  .  boi:  f  .  IX'  ■  3  .  s  /  3  lj 

.  (A);  iwo  '■l  u  io  :iod  ' 7 j  u»  raobaa  *10 

,  {'>/(>  CC-]  *  !1  J  / 

bs  l:  y  nt\fB  boa  ,  x  ni  k  j  o\  i  oi  u  113 sib 


,  no  1 3 irtuh  no lludlilklb  a ooroD 


' 


I  (*)■£*  -  (*)  »|  q-*  =  (S?M  0« 


(T.i.O 
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If  G  (A)  =  G  (x  ,X  , . .  ,,X  ;X)  e  r^f  and  is  such  that 
o.  n  i  c.  n  ^ 

(3-1.8)  p(B  ,F  )  <  inf  p(B  F  )  +  e 

n  &n  ”  G  €  &  n  G 

where  is  any  sequence  of  constants  tending  to  zero  as  n  — * >  oc  } 

then  the  sequence  is  said  to  be  effective  for  if  (3.1.2)  holds 

for  every  G  e  . 

Robbins  then  proves  the  following  theorem  which  ensures  that 
under  suitable  conditions  on  the  family  of  densities  f-^(x)  >  the 
relation  (3-1.2)  holds  whatever  G  is „ 


Theorem  3°  Assume  that 


(3-1-9) 

(3-1-10) 


for  every  fixed  x  ,  F->  (x)  is_  a  continuous  function  of  A  , 


the  limits  F  (x)  =  lim  F,  (x)  and  F  (x)  =  lim  F,  (x) 

- — — —  -00  w  -x  A  -  00  ,  A 

A  -oo  A  — >  oo 


exist  for  every  x  , 


(3-1-11) 

and 


(3-1.12) 


neither  F  nor  F  is  a  distribution  function, 

-  -00  ■  "  ■  oo  — —  —  — - - - -  —  —  — - - — 

if  are  any  two  distribution  functions  in  A 

such  that  F„  =  F  ,  then  G  =  G^  . 

' — “  °  J  G  -  G^.  — —  1  ci 


Then  the  sequence  G^  defined  by  ( 3 . 1 . 8 )  i£  effective  for  the  class 


\  -..it  f*Jt  b  ’ Vj,  *  {c\  '  /  '  ”  '  1  J 


o*  .  !>  j  -..r---  K.  >  fcu  (  *  r*a  '{  :n  « 


8b  0  .  j  )  ii  iti  i  <  <  2  blf>  1  ft;’  9ansupsa  9:9 


3:'. rf"  a  rj  i  :  ■  »•  •’  O  •  [  •  ■<  '  ‘ 


' 


a  •  X 


2  11  -  r.  . 
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of  all  distribution  functions  in  A  . 


Proof;  By  the  Glivenko-Cantelli  theorem  (p.  20  of  [9]),  we  have  that 


(3-1.13) 


Now  using  (3.1.8)  , 


P[  lim  p(Bn,FG)  =  0]  =  1 


n  ->  00 


(3.1.14) 


p(Fg  ,Fg)  <  p(FG  ,B  )  +  p(B  F  ) 
n  n 


“  rlnfJ/  P(VV  +  +  p(W 

G  e  (2J  G 

<  p(Bn.FG)  +  €„  +  p(B„,Fj  . 


n*  G‘ 


It  follows  from  (3.1.13)  that  with  probability  one,  p(F  ,F  )  -»0  ! 

G  G 
n 

i.e.  with  probability  one  the  sequence  X^,X  ,  is  such  that 


pOO  00 

(3-1.15)  lim  /  Fx(x)  d  G  (X)  =  /  F,(x)  d  G(X) 

n  ~ >  00  J  _oo  '  J  _oo 


uniformly  in  x  . 


Consider  any  fixed  sequence  X^X^,...,  such  that  (3.I.I5) 

•¥r 

holds.  Let  G,  be  any  subsequence  of  G  such  that  G,  (A)  — >  G  (A) 

k  n  k 

n  *  *  n 

at  every  continuity  point  A  of  G  ,  G  being  a  "weak"  distribution 
*& 

function  (G  (-00)  >  0  ,  G  (00)  <  1  )  .  Using  the  Helly-Bray  theorem, 
we  have  from  (3.1. 9)  and  (3*1*10)  that  for  every  x  , 


/oo 

F%(x)dGk  (X)  = 

.00  n 


.00 


-00 


F^(x)dG*(A)-fG*(-oo)F_oo(x)+[l-G*(oo)]F_(x) 


00 


sw  ,(:«]  lo  01  .q)  »»••>  9  1  x"  — - 


i  .  [0  •  (_q,.«iq  mu 


t  .  .<  S  "  cK 


. 


(^  '  b  W/  =  *  n°  b  »  „ 


.  X  al  yiimollnu 


' 

r  _ _ n  VlfVS  5S 


MiariMUb  ":1a  <w"  s  8*l*d  *0  .  0  lo  /C  Saioq  nlJnoo  <»»•  « 

x  V19V9  10I  iaris  (0I.I.O  «*'»  «M-0  *>li  w,ri  9W 

(X.  • 
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and  hence  from  (3.1.15)* 


(3.1.17) 


FN(x)dG(*) 


Fjv(x)dG*(X)+G*(-ra)F_oo(x)+[l-G*(M)]Fro(x) 


for  every  x  . 


* 

Let  us  show  now  that  G  is  a  distribution  function’  i.e.  that 


G  (-00)  =  0  and  G  (00)  =  1  .  F  is  the  limit  as  A  of  and 

v  v  -  00  A 

hence  is  a  nondecreasing  function  of  x  .  By  (3.I.H), 


(3.1.18) 


0  <  F  (-00)  and  F  (00)  <  1 
—  -00  '  '  —00  '  '  — 


Similarly,  F^  is  a  nondecreasing  function  of  x  and 

(3.1.19)  0<Fo(-co)>  Fjco)  <  1  . 

If  we  let  x  — >  -00  f  then  by  Lebesgue's  bounded  convergence  theorem  and 

(3.1.17), 

(j.1.20)  G* (-<*>)  F_oo(-c»)  +  [  1-G*(°°)  ]  Fro(-“)  =  0 

y.  y. 

and  hence  if  G  (-00)  ^  0  ,  then.  F  00(-°°)  =  0  ,  and  if  G  (00)  ^  1  ,  then 
F  (-00)  =  0.  Similarly  if  x  -» °o  in  (3.I.I7)*  then 

00 

(3.1.21)  G*(-oo)  F  («>)+ [l-G*(°o)]  F  (00)  =  1  -  G*H  +  G*  ( -00 ) 

—00  oy 

and  hence  if  G  (-00)  /  0  ,  then  F^w)  =  1  ,  and  if  G  (»)  ^  1  , 
then  F  (°°)  =  1  . 

00 ' 


...  b  .  .i  a  J9j  d 

„„.  ,3  lo  A  .»  -atell  •*  •*  „«  •  1  -  <-)  0  bnB  0  '  (~'  * 


’ 


V 


■  X 

* 


bn.  m-j o  *>m  i»3  W  '**  «’ —”*•*  «  °9ri3  '  ~  *  JSi 
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If  3n  an-^  se<luence  of  constants  converging  to  a  limit 

a  from  the  right,  then  subtracting  (3.I.I7)  with  x  =  a  from  (3.I.I7) 

with  x  =  a  where  n  -» 00  we  have  that 
n  7 

(3-1.22)  G*(-oo)[F_oo(a+o)  -  F_ro(a)]  +  [1-G*(«>)  ]  [Fro(a+0)-FOT(a)  ]  =  0  . 

Therefore  G  (-00)  ^  0  implies  that  F  (a+0)  =  F  (a)  ,  and  G*(oo)  ^  1 
implies  that  F  (a+0)  =  F  (a)  .  Since  a  distribution  function  F  is 
by  definition  nondecreasing,  continuous  on  the  right,  and  such  that 
lim  F(x)  =  0  and  lim  F(x)  =  1  ,  it  follows  that  if  G  (-00)  ^  0 

X  — >  -00  x  — >  oo 

*- 

and  G  (00)  J,  1  then  F  ,  F  are  distribution  functions  which 

<  7  -00  7  00 

contradicts  (3. 1.11).  Thus  G  (-00)  =  0  and  G  (00)  =  1  ;  i.e.  G  is  a 

distribution  function.  Now  F  =  F  from  (3.I.I7),  and  therefore 

G  * 

*  .  *  G 

G  =  G  from  (3.1.12).  Since  G  denoted  the  weak  limit  of  any  convergent 
subsequence  of  G^  ,  (3. 1.2)  holds  and  the  theorem  is  proved. 

Theorem  3  can  be  appropriately  modified  when  the  parameter 
space  Q,  is  not  the  whole  line.  If  £2=  [0  <  A  <  00]  f  for  example,  we 
obtain 

Theorem  4.  Assume  that 

(3.1.23)  for  every  fixed  x  ,  F^(x)  i£  a  continuous  function  of  A  , 

the  limit  F^x)  =  lim  F^(x)  exists  for  every  x  , 

A  — >  00 


(3.1.21*) 


' 


dBrfj  dus  jar.  a,l  no  eoojnljno#  ,sal*»»  oob«a.i- ncl  yJ 


(  .)  5  ;  «-  ....  -  •  ;  1  •••<  ■' 


1  \  v  \  .noJtl :>nui  no  vdirNib 

.b  voiq  3i  orswdftt  oib  bna  abluri  (3.1. £)  <  n  ’  k  ;i»up:  - 


9«  t9XqmBx9  iol  ,1^>A>0)=Q  II  .  soli  sloriw  srfJ  non  a!  0  saoqa 
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(3-1.25)  Fto  is_  not  a  distribution  function, 

and 


(3.1.26) 


if  G^,  are  any 


two  distribution  functions  in 


assign  unit  probability  to  0.  =  [0  <  A  <  °o)  such 
J'  F^(x)  d  GX(A)  =  J  F^(x)  d  G2(A)  for  all  x  , 


ft 

then  G^  =  . 


A  which 
that 


Then  the  sequence  G^  defined  by  ( 3 . 1 . 8 )  i£  effective  for  the  c lass 
j/  Of  all  d  istr ibut ions  which  assign  unit  probability  to  . 


As  an  example  of  Theorem  4  consider  the  case  where  we  have  a 
Poisson  parameter.  Thus  for  A  =  0  ,  let 


(3.1.27) 


r  0  ,  for  x  <  0 
t  1  ,  for  x  >  0 


and  for  0  <  A  <  00  f  let 


(3.1.28)  f^(x)  =  ^  6  Y:A"  0 

0  <  1  <  x 

Conditions  (3.1. 23),  (3.1.24),  and  (3.I.25)  are  satisfied,  and  it  will 
be  shown  that  (3. 1.26)  is  also  satisfied. 

If  G  e  ,  then 


, 


B  ' 


xo  -cuA 
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(3.1.29)  Fg(x)  =  F  FA(x)dG(A)  = 

**'  Q, 


where 


0  <  i  <  x 


ft 


fA(i)dG(A) 


fo(i)  " 


1  ,  for  i  =  0  , 

0  ,  for  i  =  1,2,  . . .  , 


(3.1.30) 


M1)  = 


-A  -vi 
e  A 


ii 


for  i  =  0,1,...,  and  0  <  A  <  oo 


Now  Fg(0)  =  ^  fA(0)  d  G(A)  ,  FG(n)  -  FG(n-l)  =  F  f^(n)  d  G(A)  , 

n  =  1,2,...  ,  and  F  =  F  implies  that 

G1  G2 

(3.1.3D  f  fx(n)  d  Gj(X)  =  f  fA(n)  d  Gg(A)  ,  n  =  0,1,2,... 


Define  the  set  functions 


(3.1.32) 


Hj (B  )  = 


F  e"*  d  G  (A) 
J  B _ J 

[eAd  G  (A) 

Jq,  j 


j  =  1,2, 


where  B  is  a  Borel  set  in  ft  .  Then  BL  is  a  probability  measure  on 
the  Borel  sets.  Since  from  (3.I.3I)  and  (3.I.3O), 


(3.1.33) 


=  |c'Xd  GX(A)  =  F  fA(0)  d  G1(X) 

^"2  ^"2 

=  f  fA(0)  d  G2(A)  =  F  e'A  d  G2(A)  , 
£2  ^ 


■ 
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we  can  write 


(5.1.34) 


Hj(B)  =  C  L  ^  d  GjM  ’  j  =  !,2 


and  where  0  <  c  <  00  .  Then 


(3.1.35) 


d  H. 
_ 1 

d  G. 
J 


-A 


and  for  n  =  1,2,...  ,  and  j  =  1,2, 


(3.1.36)  r  Xn  d  H . (X)  =  i  r  e"^  Xn  d  G.(X) 

Jfl  J  c  J£2  J 


nl 


c  fA(n)  d  G.(X) 


Therefore  we  have  from  (3.I.3I)  that 


(3.1.37) 


a  =  r  Xn  d  H  (X)  =  r  Xn  d  H  (X)  , 

Jq,  l  Jq,  * 


-A  n 

and  n  =  1,2,...  ,  0  <  f^(n)  £  1  >  imply  that  0  <  e  A  < 

0  <  A  <  00  .  Thus 


(3.1.38) 


0  <  a  =  -  [  e"^  An  d  G.(A)  <  — 

-  n  c  Jn  y  ~  c 


for  n  =  1,2,...  ,  and  hence 


00 


00 


a 

n 

nl 


n=l 


n 


<i 

—  c 


n 


<  00 


n=l 


nl  for 


-JilTW  fl&D  sw 


H  b 


(A)  .0  b  a  A  ~  ^  \ 

( ■.  .0  b  <n)x»  ^  - 


3ftri3  (!£.!.£)  fiiotl  9VBri  SW  910i®19ffT 


^  (A)  i  b  ^  ^  =  (X)  rH  to  "*&  '\m~  l°  , 

' 

aurfT  .  oo  >  A  >  0 

}JL  >  (A)  0  b  nA  "s  •'-  *  03  >  0  (8^.1.^) 
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From  a  theorem  by  Crame'r  (p.  176  of  [1])  which  says  that  if 

<Xq  =  1,  a cig,  . „ . ,  are  the  moments  of  a  certain  distribution  function 

F(x)  ,  all  of  which  are  assumed  to  be  finite,  and  the  series 


00 


r=0 


is  absolutely  convergent  for  some  t  >  0  ,  then  F(x)  is  the  only 
distribution  function  that  has  moments  a^,  a^,  .  .  .  ,  it  follows  from 
above  that  =  EL  .  Since 


(3-1.39) 


G.(B) 


d  G. 


d  H.(?0 
J 


*  d  H  (*) 


j  =  1,2  , 


then  G^(b)  =  G^{b)  ,*  i.e.  =  G^  which  shows  that  (5. 1.26)  holds 

and  hence  the  sequence  defined  by  (3.1.8)  is  effective. 


3.2.  Squared  Error  Consistent  Estimates  of  the  Prior  Distribution 
Function 

Unlike  the  method  of  Robbins  where  the  choice  of  G 

n 

satisfying  (3. 1.8)  is  nonconstructive,  Rutherford  and  Krutchkoff  [17], 

[18]  give  the  following  method  of  using  the  sequence  of  observations 

x  ,x  ,  .  ,.,x  to  give  squared  error  consistent  estimates  of  the  prior 

1  d  n 

distribution  function  G(X)  .  Von  Mises  [25]  in  19^2  motivated  the 
technique  which  will  be  developed  by  showing  that  if  the  first  two 
moments  of  the  prior  distribution  G(A)  are  known,  then  exact  upper 
and  lower  bounds  for  G(7\)  can  be  found. 


;  !  .  ql  A  btO  yd  »  «so-.'< 

g:  d ;  bnB  ,  (J  ed  0.1  baoae*  .  rial  *  )o  Iia 


0=i 


t  ...  «  JO  D  «3n^ ,;cj  ir  >/  )  o  dudiTiur 


(f),  b 


'  X 


I  J  «  re  i  ri:  :  -  >  •  «  - 

a  ;  io  i .1  j^iBianoO  toir3  b9i»op3 


io  so.  o  9  a.  >is  !w  iilridoH  io  bo^aor  aril 

ow3  aaiii  sr!3  il  3&rb  g/iJrworie  baqoXaV9b  ad  XXiw  rioldw  oupin  fon3 
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The  general  class  ^  which  has  a  known  subclass  ^  to 
which  G( A)  belongs,  and  the  class  dfip  t0  the  known  conditional 

distribution  function  F,  (x)  belongs,  are  described  below.  The  class 
p  is  defined  by  the  following  conditions: 

(a)  G(A)  is  absolutely  continuous  with  respect  to  Lebesgue  measure. 

(b)  Its  density  function  g(A)  is  determined  completely  and  continuously 
by  its  first  p  >  2  moments  in  some  open  p  interval. 

The  subclass 

S'  of  is  defined  by  the  additional  condition; 

(c)  All  g(A)  in  are  determined  by  the  same  known  continuous 

function  described  in  (b). 

Thus,  for  example,  when  the  fourth  moment  exists,  it  can  be 
shown  that  if  the  skewness  and  kurtosis  are  known,  then  the  prior 
distribution,  which  will  be  a  member  of  the  Pearson  family  of  curves, 
is  determined  completely.  The  Pearson  family  of  curves  can  be  represented 
by  the  solutions  of  the  first  order  differential  equation 


(3.2.1) 


dy  _  y(m-x) 


dx 


a+bx+cx 


where  the  shapes  of  the  curves  depend  on  the  parameters  a,  b,  c,  and 
m  which  will  be  known  if  the  skewness  and  kurtosis  are  known.  The 
class  ^  is  defined  by  the  condition 

(d)  for  each  F-.(x)  e  S  there  exists  known  functions  h  (•)  , 

A  p  k 

k  =  1,2, ... p  , 


such  that 


■U9  10  ,'  i  II 1  .to-.l*  1  S'. 3  io  Tfl  lo  i  T  a  9<s  Ulv.  r  ,  roJ  '  *lb 
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(3.2.2) 


E[hk(X)  |  A]  =  Ak 


As  an  example  of  condition  (d),  suppose  the  observations  were 
distributed  according  to  a  Poisson  density  with  mean  A  and  that  A 
was  distributed  according  to  an  unknown  Pearson  distribution.  Then  for 
p  =  4  the  functions  h^(x)  are  given  by  h^(x)  =  x  ,  h^(x)  =  x(x-l)  , 
h^(x)  =  x(x-l ) (x-2 )  ,  and  h^(x)  =  x(x-l ) (x-2 ) (x-3)  .  Thus,  for 
example , 


00 


E [h^ (x)  |  A]  = 


x=0 


V  A*  e"* 

L 


=  e 


+ 


,  -A  A 
A  e  e 


A 


Ic 

Similarly  we  see  that  E[h,  (X)  j  A]  =  A  ,  k  =  2y3,  and  4  . 

Taking  expectations  with  respect  to  A  in  (3.2.2)  we  get 

(3-2.3)  E(Ak)  =  E[E(hk(X)  |  A)]  =  E[hfc(X)] 

for  k  =  l,2,...,p  ,  and  these  are  all  finite  because  of  condition  (b). 
Define  the  functions 

n 

(3.2.4)  ^k (— n ^  =  n  ^  hk^Xi^  9  k  = 

i=l 

where  x  represents  the  sequence  of  observations  x  ,x  , ...,x  with 
— n  d.  n 


,  X  -  t\  (X)  .ri  a  in  5  »w  IiMilaiS 
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corresponding  random  variable  X  ,  and  the  vectors 

— n 


(3.2.5) 


— (^p)  “  •."»  Mpfen)) 


and 


(3.2.6) 


u  =  (E(A),  E(A2),  E(Ap)) 


From  (3.2.3)  and  (3.2.4)  we  see  that 


(3-2.7)  E[M(X  )]  =  (i 

—  — n  *— 

.  p. 

and  since  E(A  )  <  00  f  we  have  by  the  strong  law  of  large  numbers  that 


(3.2.8) 


and  thus  M(X  )  provides  estimates  of  the  first  p  moments  of  the  prior 
distribution.  Assume  that  the  prior  density  functions  g(A)  belong  to 
a  specific  subclass  whose  members  we  denote  by  g(A;jj.)  ,  the 

notation  indicating  the  dependence  of  the  g(A)  on  their  moments. 


The  estimator  of  g(A,°u)  will  be  the  density  function 
g(A|M(x^))  .  For  p  =  4  ,  g(A;M(  x^))  represents  the  solution  of 
Pearson's  differential  equation  (3.2.1)  with  M(xn)  substituted  for 
ji  „  Since  from  condition  (b),  g(A)  are  continuous  functions  of  ji 
for  every  A  ,  and  (3.2.8)  holds, 


(3.2.9) 


g(A;M(x  )) 


a  „s  . 


g  (*,*£) 


a .  e  .  A  * 


. 


*  - 


gnoi»<f  •  (A):.  lM*b  wM*  •*»  >Mtt  •»»•*  .nollt.dii3.lb 


■ 
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We  define  our  estimate  of  G(A)  by 


(3.2.10)  =  /  g(t;M(xn))dt  , 


and  prove 


Theorem  5*  If.  G  e  ^  and  F  e  ^  ,  then 


(3.2.11) 


lim  E [G  (A;M(X  ))  -  G(A)]  =0  a.e.  A 

n  — *  00 


where  the  expectation  is  with  respect  to 


Proof .  Defining 


(3.2.12) 


* 


Gn(^;M(xn)) 


-n 


g(t;M(xn))dt 


and 


(3.2.13) 


we  obtain 


*  /  X 

G  (A) 


g(t;u)dt  , 


-n 


|Gn(X;M(xn))  -  G(X)  |  <  IGhCXjMC^))  -  Gn(\-M(xn))| 

+  |G*(X;M(xn) )  -  g*(X) I 


+  |G  (X)  -  G(X) |  . 


(3.2.14) 


X 


n- 


t  3b  r;j)8 

nA 


ntaJcfo  9 w 
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Each  term  on  the  right  hand  side  of  (3.2.14)  converges  almost  surely  to 
zero  a.e.  A  ,  which  implies  that 

(3.2.15)  Gn(X;M(Xn))-ili^.  G(A)  ,  a.e.  A  . 

2 

Since  [G  (A;M(x^) )  -  G(A)]  is  bounded  for  all  n  ,  then  from  the 

dominated  convergence  theorem  we  obtain  (3. 2.11)  and  the  proof  is 
complete  . 


■ 

1  i.  •  . :o.'  '  i  .  b  i  is 


.8.8 


•  *• r’  <  (X)3  D 
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CHAPTER  IV 


SELECTING  THE  BEST  OF  K  POPULATIONS 
PARAMETRIC  CASE 


4.1.  Introduction 

Suppose  we  have  k  populations  (categories,  varieties,  etc.) 

for  which  we  can  observe  random  variables  X.  whose  distribution 

1 

depends  on  an  unknown  parameter  A^  for  i  =  l,2,...,k  .  Define  the 
"best"  population  as  that  population  with  the  largest  A^  .  The 
empirical  Bayes  approach  has  been  used  by  Deely  [2]  to  obtain  an 
optimal  decision  procedure  which  will  either  (l)  select  the  best 
population;  or  (2)  select  a  subset  of  k  populations  which  contains 
the  best. 


The  observable  random  variable  X.  has  for  each  of  the  k 

l 

populations  the  density  f_  (x)  with  respect  to  some  measure  (j..  on 

l  1 

the  sample  space  where  the  unknown  parameter  A^  €  ft  ,  a  subset  of  the 
real  numbers.  Define  A  =  (A^,  .  .  . ,  A^)  to  be  the  realization  of  the 

random  vector  A  =  (A  .  .  .  ,A^)  where  A  e  ft  ,  a  subset  of  Euclidean 

k-space.  Assuming  that  the  A.’s  ,  i  =  l,2,...,k  are  independent 
random  variables,  we  define  to  be  an  a  priori  distribution 

function  of  A.  ,  and 

l 


k 


■?a.  : 


'  -  M  i’  , ;  VL 

.  _  Jl  :Ji  . 


-  xi:  '■  ) j:  t 1 1  5  q  vBri  g  j 

.  ucfi  '  noHit  .X  a  xdBiv.v  mob  oat  avis  add  ftfto  sw  rf.  r  w  to* 

' 

'  O’  . 

" 1,1  03  *  in‘-  f-  7  d  o  Ida  fua  &  laadsa  (2)  to  Jaluqoq 


• 9XdaiT&v  inobnBT  sldav  do  a/IT 

t  0.  J  J  Tad; vTT'.TBq  awonAau  adi  9!/9it.;  ooBq a  aiquraa  firfd 

. 

•  Vi.r  sia  <...<£*  I  *1  *  a*  A  srfa  3s rfd  gniiiiuaaA  . soaqa-d 
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to  be  an  a  priori  distribution  function  on  Cl  .  The 
x  =  (xi ’X2’  *  ‘  *  ,xjc)  °f  the  random  vector  X  =  (X^,X^, 
xk ,  a  subset  of  Euclidean  k-space. 


observation 
..,Xk)  is  in 


If  we  wish  to  select  one  population  and  say  it  is  the  best, 
then  an  appropriate  action  space  is  ,^3  =  c/S)-^  =  {d^,d  ,  . .  .  ,dk)  where 
action  d^  means  I  ,Tsay  A^  is  the  largest."  If  we  wish  to  select  a 
subset  of  the  k  populations  and  say  that  the  best  is  in  this  subset, 
then  our  action  space  is  jS)  =c/90  =  [S  ,S  ,  ...,S  }  where  p  =  2^-1 

2  1  2  p 

is  the  number  of  possible  subsets  of  the  integers  [l,2,...,k]  where 
the  empty  set  is  excluded.  Action  S_^  means',  "say  subset  contains 

the  best  population," 


Suppose  that  for  d^  e  JS)  ^  ,  i  =  l,2,...,k  or  s*£>2  . 

j  =  l,2,...,p  there  exists  a  loss  function  L(5(x),A)  >0  ,  the 
consequence  of  taking  a  decision  in  either  JQ  or  cO  2  whe  n  A 
is  the  true  parameter  vector  and  where  5  is  a  function  which  assigns 
a  decision  6(x)  e  jQ  or  5(x)  €  jQ  2  to  each  possible  value  x 
of  the  random  vector  X  .  The  expected  loss  when  A  is  the  parameter 

vector  is 


(4.1.2)  R(S,A)  =  J  L (6 (x) , A)  f^(x)  dM.(x)  , 

xk 

k 

where  f^(x)  =  II  fA  ^  witk  respect  to  measure  (j.  =  \i^  X  \i^  X  . .  .  X 

i=l  f 

The  overall  expected  loss  when  the  a  priori  distribution  of  A  is  G(A) 
given  by  (4.1.1)  is 


ni  a i  X  *fo3  39v  mobnai  ad*  *io  (^x, - , • 


93E.C  i-i  n  »j  i  ou  io  >aduB  ti  » 

31  I  fcf'S  no  :'  iluqoq  'v  *•  H 

r»,  ,  (  (b,...,sb(Ib)  -  j0.  *  6.  **  **•«»  *«“*•  =l»iico7qq»  fl» 

t3a«dur  e.  li  flJt  si  dead  arb  3firi3  <(#a  N»®  aflolasliqoq  >r  3  -  >s  taa 

0o  9i\j  9  t  aoizio*b  a  gnl5J»3  io  eanaupsanoD 


M  X  ...  X  M  X  1 4  =  n  awaafiai  o3  3o&qa^>i  rfiiw  (x)  ^  ]]  *  *)^  91sHw 

A  3o  nci Jud.ii38ib  iioisq  6  aril  nadw  seol  bs  asq  o  r-aisv o  < 
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(it. 1.3) 


where 


R(5,G)  =  f  R(S,A)d  G(A) 
J  b 


ft 


nk  J  Xk 


L(5(x),A)  f^(x)  d^(x)  d  G(A) 


[  <fG(8>x)  4l(x)  > 

J _ l r 


(b.l.b)  (fG(5,x)  =  [  L(5(x)>20  f^(x)  d 

J  \c  — 


ft 


k  „k 


by  Fubini’s  theorem  if  L,(5(x),A)  f^(x)  is  integrable  on  X  X  ft 


If  there  exists  a  decision  function  5  such  that  for 

G 


a 


.e.  (n)x  , 


(4.1.5)  Ct)G(&G(— ) ^— )  - 

for  every  6(x)  ,  we  have 

(4.1.6)  R(B_,G)  <  R(S,G) 

G 

and  5  is  called  a  Bayes  decision  procedure  with  respect  to  G  .  For 

G 

a  finite  action  space  jQ  =  [d^d^,  .  .  .^d^}  we  have  from  (4.1.5)  that 
5  is  defined  by 

(4.1.7)  6  (x)  =  d-  where  j  is  any  integer  l,2,...,m  such 

G  —  J 

that  (L(d  ,x)  =  min  Mr(d.,x)}  • 

J  1  <  i  <  m 


'  L 


(/_.•>  b  (x  .Lib  (x)4  (£,(*)**)J  ' 


(/)0  b  (x)^  (£,(x)a)j  ^  -1-  ' 


>  (2.(2  0d)0^ 


*v*ri  dw  ,  (x)S  y*r»v3  ioi 


icr:  .  D  o  »i  i  >  o  o  q  re/--  ">?'»  * 
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5^  is  not  directly  available  to  us  unless  we  have  a  complete  knowledge 
of  G  .  Assume  that  G  exists  but  is  unknown  and  that  the  decision 
problem  occurs  repeatedly  giving  us  the  random  vectors 

(4.1.8)  (xrAL),  (x2,a2),  •  ••,  , 

which  are  independent  and  such  that  the  A^'s  ,  i  =  1,2, ...,n  are 
identically  distributed  according  to  G  .  The  sequence  (4.1.8)  then 
contains  information  about  the  form  of  the  unknown  G  . 

If 


(4.1.9)  Sn(.)  =  8n(X1,X2.---.Xn;  •) 

is  a  mapping  from  -^kfn+l)  ^  ^  and  takes  action  &n(x)  e  & 
with  loss  L(5  (x),A)  ,  then  for  the  given  sequence  D  =  {5^}  ,  the 
expected  loss  on  the  decision  6^  will  be 


(4.1.10) 


The  overall  average  loss  will  be 


(4.1.11) 


where  E  denotes  expectation  with  respect  to  the  n  independent 
n 

random  vectors  g ’  *  *  *  n’* 


i .e  . 


tot,;.  -J  r  .b  -  -  ■  -a  r>  J.’.fb  Simjpo.A  .  0  io 


.  („*.  *)  <•••  «  cl-,  ’0  .CjA.jX)  (8.1..' 

,  if.<  i,  .)  ,-Iwj  >n n  3r  t>rfqoi(i:  'to  rtw 


.  D  nwor  ju  sr'3  io  iraoi  w  t  luooV  r/ol3r,ia  oin  :  mil  >  ioo 


■ 

C  a  (x)  c  nol:  b  J  bn*  Co 


o  >upsa  n9vi\  s/to  id  nsrii 


9d  iiiw  ^3  nolaloab  arto  no  asol  bjioaqxa 
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(4.1.12)  En  <t>G(&n(x),x)  =  f  ...  ["  c|>  (S  (x),x)f  (x)dn(x)...£  (x)dn(x)  , 


xk  Jxk 


where 


(4.1.13) 


£g(2) 

=  [  fA(x)  d  G(20  > 

V  " 

and 

k 

■  n  w  • 

ii 

•r-l 

The  sequence  D  is  said  to  be  asymptotically  optimal  with  respect  to 
G  if 


(4.1.14) 


lim  R(6  ,G)  =  R(6  G) 
„  „  n  U 


for  every  G  e  ,  some  class  of  a  priori  distributions  on 
and  5^  €  D  is  called  an  empirical  Bayes  procedure. 


If  r  observations  are  taken  from  each  of  the  k  populations, 

*  kr 

then  we  get  an  r  X  k  observation  matrix  x  e  X  with  the 

*  &  * 

corresponding  random  matrix  X  .  When  X^,  X^,  ...,  XR  are  the  prior 

* 

observations  of  the  random  matrix  X  ,  we  can  define  an  empirical  Bayes 
procedure  in  an  analogous  manner. 


4.2.  Selecting  a  Subset  Containing  the  Best  Population. 

It  will  now  be  shown  that  the  Bayes  procedure  for  selecting 
the  best  out  of  k  populations  is  precisely  the  same  as  the  Bayes 
procedure  which  selects  a  subset  of  k  populations  which  contains  the 


X 
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best  for  a  particular  loss  function. 


£ 

Considering  •e2  =  [S1,S2,  . .  .,S  )  where  p  =  2  -1  ,  as 


sume 


for  simplicity  that  the  first  "k"  sets  of  are  the  "k" 


one 


element  subsets  of  {l,2,...,k}  .  Thus  for  j  =  l,2,..f,k  action 

means:  "say  population  j  is  the  best;"  and  for  j  =  k+l,...,p  action 

S.  means!  "say  the  best  is  in  the  subset  S.  For  this  problem 

J  J 

we  assume  that  the  loss  function  is  of  the  form 


(4.2.1)  L(Sj,X)  -  Q'jq  (^[  k]  ”^q )  >  J  -  l>2,...,p  , 

q  €  S  . 

J 

where  a.  >0  for  all  j  and  q  ,  and  Ar,  =  max  A  •  For 

jq  -  i  <  j  <  k  J 

the  problem  of  selecting  the  best  of  k  populations  we  will  use  the 
linear  loss  structure 


(4.2.2) 


L(di’-)  -  A[k]  '  \  ’  1  ■  1’2’ •••’k  • 


Suppose  that  r  observations  are  taken  from  each  of  the  k 

*  ,  *  *  _  kr 


populations  giving  us  an  r  X  k  observation  matrix  x  -  (x^,x^, . . . ,x^)  e  96 

*  *  *  *. 

and  with  the  corresponding  random  matrix  X  =  (X^,X2, . . . ,X^J  .  Then 


(4.2.3) 


where  f 


.(j) 


x‘  '  being  the  i*th  observation  on  X.  , 

j  J 


and,  analogous  to  (4.1.4),  we  have 


,cc!3ofl#i  aao.  ’sali/aiitaq  a  loi 

-  ;  dw  !•:.....,  ,  .  5 

. 

loi  a  T  .  {i\. . .  tS,I 

v'  13  i'.i  . 


!  - 


■ 

'  •  ■ 

ddi  sail  IIlw  £>w  anol  -i  qoq  M  io  Jesd  ari)  laiJoa:  »f  to  mr.rdoiq  .»  'a 

•  »:  *  .  <3  i  o  ' 


:,J  ‘  n  .  \i  r:  .3  a  do  >t  X  I  he  cu  ilv- g  •  .1  ‘>’1  qoq 


' 
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(4.2.4)  )  =  F  L(5(x  ),A)  f^(x  )  d  G(X) 

k  — 


if 


a 


(4.2.5) 


=  /,  (*[k]'V  d  G(A)  ’ 


a 


then  from  (4.2.4)  and  using  loss  function  (4.2.1)  we  see  that 


Vsj’**)  -  X  aJ 


C 

jq  q 


q  €  S 


We  now  prove 


Theorem  6:  In  the  loss  function  given  by  (4.2.1),  let  a.  =  a  >  0 

for  i  =  1 ,2,  . .  .  ,k  .  For  c  given  by  (4.2.5),  let  c  r  =  min  c  . 

q  L  J  i  <  q  <  k  q 

If 


a.  >  a 

jq  ~ 


q  €  S  . 
H  J 


for  every  j  =  1,2, . 


,p  ,  then  min  (S  . ,  ac  ) 

1  <  j  <  P  J 


min  <fr(S.,x  ) 

i  <  j  <  k  G  J 


Proof;  If 


a.  >  a 

jq  - 


qcS. 


for  every  j  =  l,2,...,p  ,  then 


.  (£)0  b  ('  *).  i  (£,(£)«) J  \  .  '*,3)  p 

V  ■  '  • 


U 


3sriJ  99a  9W  (I.  .  )  nc  *i  seol  ;  niau  b/ra  (4.  ,  J) 


. . W 


- 


r  0  navi-  d  ioV  .  »  £ 

« 


I*  )-••'  1  .  r  •  •  <  1  t  XI-  "2  iSl 

*  >  t  >  i 


II  aoort 


<  -  t  ^T»V9  Tol 
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q  €  S  . 
J 


a.  c  > 

jq  q- 


q  €  S 


which  implies  that 


“jq  °[1]  C[H 


(4.2.6) 


min  (j)  (S  ,x  )  <  C)>  ( S  ,x  )  for  every  j  =  1,2,  . .  . , p 
1  <  i  <  k  1  b  J 


since  a  c 


r 1 1  =  min  Cfp(S  ,x  )  and 
L*J  1  <  i  <  k  1 


a .  c 

jq  q 


q  €  S  . 
J 


Since  (4.2.6)  is  true  for  every  j  =  1,2, ...,p  ,  min  (L(S.,x  ) 

1  <  i  <  k  G  1  ~ 

<  min  (|)  (S.,x  )  .  Since  min  d>  (S.,x  )  >  min  (j)  (S.,x  )  also 
1  <  j  <  p  °  J  1  <  i  <  k  G  1  ~  ~l<j<P  J' 

min  (S.,x  )  =  min  )  and  tjie  proof  is  complete. 

1  <  j  <  p  J  l<j<k  J 

The  following  corollary  is  the  main  result  of  this  section. 


Corollary  3«  In  the  loss  function  given  in  (4.2.1),  let  =  a  >  0 

for  j  =  1,2, ...,k  .  If 


cc,  >  a  . 

jq  - 


q  e  S 


then  a  decision  procedure  6^  which  is  Bayes  with  respect  to  G  when 
<£)  =  JS)1  =  [d1,d2,  . .  .,dk)  and  L(d.,A)  =  a(A[k]-\)  for  i  =  1,2,..., 
is  also  Bayes  with  respect  to  G  when  =  c/3  2  =  C^i , S^ ,...,Sp}  , 

p  =  2k-l  ,  and  L(S_.,A)  is  given  by  (4.2.1). 


Sarto  aslXqmi  rioldw 


=  t  iol  (  xt  2)  p  >  (  3f,  2)  p  nla  (d.S.4») 

A  >  1  >  1 


*»•  (  nlm 


F 


Pi*  s  =(V^ 


2  5  P 


iin/r 


x<  2;  j)  ni  t  t  *  t  ^C*sv9  10I  auis  al  (d,S.+Q  t>ool8 


;  x,  a)  P  niir  <  (  x,  8)J>  alar  »oal 8  .  (  x,  8)  j> 

q  >  L  >  X  '  v 

ba*  (  *w2)04>  nl,fT  *  ( 

q  >  t  >  I 


.  ^Salqraoo  ai  5ooiq  axto 


. 


jo  _;.i  t(  )  ci  f?t>vtg  noi  tc-  »ui  f  so.  2  1 


,.y  "  oci eiasb  a  narto 

' 


.(i.s.4)  x  ■  2'  i  'fj  t  x-yl£ 
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PrP ° f «  If  Is  the  Bayes  decision  procedure  with  respect  to  G  for 

o0  1  ,  then  by  (4.1. 7) 


(4.2.7) 


VV-  )  =  min  i ^ ^  ^ 

!  <  j  <  k  ^  J 


* 


for  each  x  where  <JG(&G,x  )  is  given  by  (4.2.4).  But  since  5, 


must 


select  some  one  of  the  first  k  sets  S  ,S  , ...,S  , 

JL  CL  K 


4G(5G^  )  =  min  $  (s -5^21  )  • 


1  <  j  <  k 


Thus  from  Theorem  6 


*)  “  min  > 

1  <  j  <  p  J 


which  implies  from  (4.1.7)  that  the  decision  function  6  is  Bayes 

G 

with  respect  to  G  for  &  ^  ,  and  the  proof  is  complete. 


Therefore  the  Bayes  and  empirical  Bayes  procedures  which  will 
be  derived  for  selecting  the  best  of  k  populations  are  also  Bayes  and 
empirical  Bayes  procedures  for  selecting  a  subset  which  contains  the  best 
population,  provided  the  conditions  of  Corollary  3  are  satisfied.  Thus 
only  one  population  is  selected  even  in  the  subset  formulation. 


4.3.  Bayes  Procedures  When  G  Belongs  to  a  Particular  Parametric 
Family. 

Suppose  that  the  a  priori  distribution  G  belongs  to  some 
particular  parametric  family  may  be,  for  example,  the  class 

of  all  normal  distributions.  If  we  have  r  observations  on  each 


1  >  '  (V.JM)  yd  n»d3  ,  ,0o 

[C*<  b)  M  *tm  ■  (Vo3  a"  •** 

2 _ _  ;>  2  B198  ^  38 ill  Sth  lo  *ao  £-nroa 

t  %  •  •  • « c  j 

.  (  ,  •  a.  -  (**,  '•/, ' 

* 

*  <*5*asV-' 

X 

. 

,c  •  1  /  c  1  .  ■>  s  t  Iti*  ■  ,  -  )  c:  1  3  1  ,jW 

,9J  ,ic.-TOo  at  too  q  *d»  bna  ,  j6»  aot  0  do»>ie®“  <»iw 

fa/is  asysS  ostB  sifi  enoiiU  qoq  >•'  3asd  erf*  &n  osTor  '  D‘  r  b  ,d 

u  •  .  .  ;  r  .  t  ;8  £  yy&IIc'<.  j  to  «  *•*  '  91  ,b  x<*  *a(  1  - -  4 

.  io13bXuittio1  39edua  riri3  r.l  -?£*vd  bedside  el  flo.  uq  i  0  ^!c 

oiti^nBiG?  iBluoln ./I  tj  U  y:2LA-H  _«£  vt 

■  o  t  '  ,iOl  ad  '  i  1  -  -I  •  2  aaoqq 

roJ  JB'/i:  ado  i  viirl  aw  tl  .  ano  3ucJ'  *  -  a  -  i,rr- " 
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i  .  *  *  * 

population,  let  X  ,X  ,  ...,X  be  the  prior  observations  of  the  random 

l  — c.  — n 

matrix  X  .  For  G  €  and  with  parameter  6  =  (6^,6^, .  . , ,6^)  , 

we  wish  to  find  an  estimate  of  G  ,  or,  equivalently,  an  estimate 

6_^  of  £  such  that  the  Bayes  procedure  with  respect  to  G  ,  6  , 

n 

will  also  be  an  empirical  Bayes  procedure  with  respect  to  G  .  Since 

G^  €  ,  finding  5^  amounts  to  finding  the  Bayes  procedure  with 

n 

respect  to  G  .  We  now  state 


•x-  -X- 

Theorem  7 «  Let  •  •  •  i  be  independent  random  matrices  ,  each 

consisting  of  r  X  k  components  which  are  independent  random  variables 


Let  G  b£  £  k-dimensional  distribution  function  oh  ft  which  is 

•x- 

inde pendent  of  the  X  * s  ,  and  let  G  b£  a  distribution  function  on 

-X-  -X-  -¥r 

ft  which  is  a  function  of  X.,X^,...,X  and  such  that 
- — 1  —2 7  — n - 


(4.3«l)  P{  iim  G  (A)  =  G ( A )  for  every  continuity  point  A 

n  —>  oo 

of  G]  =  1  , 


*  * 

where  probability  P  i£  taken  with  respect  to  — 2*  *  ’  °  *  Let  the 

•X* 

loss  function  Lfd^A)  and  densities  f^  (x^)  defined  in  (4.2.3) 

•x- 

such  that  L(d_^,A)  f^(x  )  ljL  bounded  and  continuous  in  A  for  every 

*  kr 

d^  €  >  i  =  1,2, ...,k  and  x  e  X.  .  Let 


(4.3.2)  L(A)  =  max  {L(d  ,A))  . 

1  <  i  <  k 


Then  the  sequence  {&G  )  i£  asymptotically  optimal  with  respect  to  G 

n 


if 


(4.3.3) 


L(A)  d  G(A)  <  oo 


' 

j  d  nt»  ,\  rdns.  vlu  9  i  0  -  3d  i  3  ns  bnl;  3  t  a  »v 

t  0  od  dosq«97  Idlw  n  ;b»Doiq  39Yr?  B^d  darid  flow  a  <  do  Q 

: 

. 

a  ade  wc  9W  .  ■>  cj  doaqa  7 


doaa  , «  jol7dam  Jirob  7  dn  'Uroqi '.  •. *  *•  t  dsJ  .  Y  nT9709fiT 

. 


■X*  * 


\19V9  70d  A  r  fei  ■  j;  ...  i;  .  I  ^  A^.b)^  dflid  ri  t 

•Js.  .  ,  ^  '....*  \I  i  t  A'V  ••  A 


•  •  .  A  •-  ■  i  T;j 


3  o_J  dost;  3  37  rid  Awn  d  ,  .>  Ian  cn:  _  .&  t  {  }  3  .rsupaa  s»  I  i  df[T 


Proof . 


Define 


(4. J. 4)  A^d^x*)  =  r  [L(d.,A)  -  L(d  ,A)]  fA(x*)d  G(A)  , 

flk 


and 


(It. 3. 5)  Ag  (d.,x*)  =  f  [L(d.,A)  -  L(di;A)]  fA(x*)d  Gn(A) 

n  k  — 

a 


if  Gn  :*-s  suc^  that  (4.3.1)  is  true,  we  have  from  the  Helly-Bray 
theorem  that 


(it. 3. 6) 


Ac  Cdi^2i  ) - *  AG^di?-  ) 

n 


*7T 

since  L(d^,A)  f^(x  )  is  bounded  and  continuous  in  A  for  every  d_^  e  cp) 
*  kr 

and  x  e  JC  •  However  it  follows  from  Corollary  1  (with  appropriate 
notational  changes)  that  if 


(4.3.7) 


*  *  *  * 


A.  (x  )  =  A.  (XX, ...,x  ;x  ) 
i,n  —  i,n  —  l’— 2.  — n  —  ' 


* 


is  a  function  of  the  prior  observations  such  that  for  a.e.  x  , 


(4.3.8) 


*, 


Ai,iA*)JL*  Vdi’£> 


for  i  =  l,2,...,k  ,  then  the  decision  procedure  D  =  {6^}  defined  by 


(4.3.9)  6  (x  )  =  d .  where  j  is  any  positive  integer  l,2,...,k 

n  j 

such  that  A.  (x  )  =  min  [A.  (: 

j-n  -  1  <  i  <  k  1-n  • 


* )} 


V 


C>  j  b  M9V9  10I  X  rd  auouidrtno  baa  fvuiod  bj 


sJBliqoiqqB  !  ytrsIIoioO  moil  awollol  Jl  isvawoh 


vd  bsnlisb  (  dl  =  0  siubnboiq  noifiasb  a  Id  isrid 


•< 


lari*  rtoua 
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is  asymptotically  optimal  relative  to  G  if 


(4-3-10) 


f  L  (A)  d  G(A)  <  oo 

J  V 


a 


Replacing  (4.3*7)  by  (4.3*5)  it  follows  from  (4.3*6)  that  the  decision 
procedure  D  =  [6  }  defined  by 


n 


(4.3*11)  6_,  (x  )  =  d.  where  j  is  any  positive  integer 

G  —  i 

n  J 

l,2,...,k  such  that  A  (d.,x  )  =  min  (A  (d.,x  )} 

G  1  t  ,  G  1 

n  1  l  <  k  n 


is  asymptotically  optimal  with  respect  to  G  if 


/  L(A)  d  G(A)  <  oo 

J  k 


ft 


and  the  proof  is  complete. 


Since  the  j  for  which 


ag  (d.,**)  =  min  [Ag  (d^x*)) 
nJ  l<i<k  n 


is  the  same  as  the  j  for  which 


<L  (d,,x*)  =  min  (djjX*)) 

Gn  J  1  <  i  <  k  \  1 


where 


<fG  (d.,x*)  =  [  L(d.,X)fx(x*)  d  6n(X) 

n  V 


J 


o  s  ' a  -  \  -  ■  A  i  qc  ' 3  J  1  3i 


"o' 


noi.l9®b  .d>  »d>  ■»**  11  *  (T-C-4  *"  •,*I<’851 


Xd  b»n list  (  ,a)  •  a  oivbssow 


i98*lai  9vlJi80q  xnB  el 


'  X 
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it  follows  from  (4.1. 7)  that  6  is  a  Bayes  procedure  with  respect 

U 

n 

to  is  also  an  empirical  Bayes  procedure  with  respect  to  G 

n 

due  to  (4.1.14). 

If  G  e  $  ,  some  parametric  family  with  parameter 
0  =  (d^yO^, . . . ,6^)  y  then  Theorem  7  suggests  the  following  procedure: 


(l)  For  specified 


3 


find  the  Bayes  procedure  5  as  a  function  of 

U 


the  observation  x  and  the  parameter  0  of  G  ;  i.e. 

5g(x  )  =  h(x  ,0)  . 

(2)  Verify  that  L(d^,A)  f-^(x  )  is  bounded  and  continuous  in  A  for 
any  d^  e  ^  .  Verify  that 


/  L (A)  d  G(A)  <  °o  . 

J  k 


(5)  Find  an  estimate  0  of  0  such  that  G — >G  with  probability 
v/  — n  —  n 

-X- 

one.  Then  5_  (x  )  =  h(x  .0  )  is  an  empirical  Bayes  procedure 
G  —  —  — n 

with  respect  to  G  e  . 


From  (4.1.7)  it  follows,  again,  that  the  Bayes  decision  procedure 

6  with  respect  to  the  a  priori  distribution 
G 


G  = 


minimizes 


(it. 3. 12) 

\ 


L(di>ZOf?/x*)d  G(^) 


for  d^  €  o£)  ^  , 


V  ‘  x ,  E-  J  V  ' 

tl' 

,  -u  ’  ’  •  >’  "r  »  5. 3  in  Iq.n  ■  /  '■  -  '•  ','t. 

i  J  ;i  ;«_•  i  v  '  A  KJ  Stfic  3 

.•ftiubsDOTq  gaiwollol  sd^  eJe9gg u«  V  maioarfl  ne/to  t 

1: 

;  0  lo  ‘  T‘ n  isq  yi  i  l)tii 

■ 

vJjfcl idadoiq  rbiv  0  —  0  rbua  G  G  s3hrn:5ae  r:B  bnJtt  (£) 

r<  j-  .  i  ,  •  n  r  1  )  .  i  C\ .  .  -  moi'  . 

noil  i uc  ■  ib  i  j o  1  t,  .  if  l  o3  3oaqsd  n  ''  « 


:  i  i  •  . 
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where  i  =  1,2, .  ..,k  .  Using  the  loss  function  (4.2.2)  we  have 

(4.5.15)  <ydi’£*)  =  /  A[k]fA(-*)  d  G(-)  '  /  WTV  G(*) 

Since  the  first  term  in  (4.3*15)  is  independent  of  i  ,  the  Bayes 
procedure  is  defined  by 


(4.5.14) 


/  w  \ 

5^(x  )  =  dj  where  j  is  any  positive  integer  l,2,...,k 

such  that  J  A^.fA(x*)dG(A)  =  max  J  \.fA(x*)dG(A)j-  . 


Suppose  that  has  a  density  function  g^(A_^)  with  respect 


to  some  measure  v.  .  Then  if 

l 


(4.5.15) 

and 

(4.5.16) 

(4.5-17) 


=L  dGJ(v 


(x*)  8i(\) 


/  ,  *V  ,  *v 

=  E(Ai|x.)  •  fQ(x  ) 


f 


•  :  •  .  c-.i  .3 

•  X;3  >(%)(.}  A  "  -  (rf)a  b  (*  s^-i^A  ;  ••  (  s,^)^  (M.f  0 


.  ->'{1  f.*.  -  %  .  .o  ■* n sfc  •  '  ;  ni  -  >3  til  -■■11  soft  ’  r’ 

f  •  t  .  . . 


i  »vi ..  f  o  •  -v 


(4I.e.4l) 


I  .<  .  •  W  •  ’  •  .  ' 

^  \  * 


9*1/2?  J  S'  03 

bns 

*  " 

r 

x)  ^  II  .  /  *  (£)0b(*  ) 

D1  J  }{(/)±vb(/)18i^x  1  . 

^  (\  }  •  V  •/ 

-  78  - 


where 


(4.3.18) 


E(AilxI)  mJ  \  8i^ilxI)  d  W 


is  the  a  posteriori  mean  for  the  i'th  population,  and 


(4.3.19) 


•G(**)  =  n  fG  ^ 

G  A  gj  J 


Since  f ^ )  is  independent  of  i  ,  we  have  from  (4. 3*14)  that  the 
largest  a  posteriori  mean  gives  the  Bayes  procedure  with  respect  to  G  . 
We  have  proved 


Theorem  8;  From  each  of  k  populations  let  there  be  r  observations 

taken  on  a  random  variable  with  density  f ( x )  for  i  =  1,2, . . . ,  k  . 

i 

If  is_  distributed  according  to  the  density  g^(V)  with  respect 

to  some  measure  ,  then  the  Bayes  procedure  for  selecting  the  best 

population  under  the  linear  loss  function  (4.2.2)  jLs  given  by 


(4.3.20) 


5„(x  )  =  d.  where  j  is  any  integer  1,2,..., k  such 
G  —  j 

*)£ 

that  E  (A .  | x  . )  =  max  [E(Aj|x_f)} 

JJ  1  <  i  <  k 


Calculations  may  often  be  simplified  if  we  use  sufficient 
statistics.  Using  the  ushal  factorization  criteria  we  see  that 


(4.3.21) 


■  •) 


'  , 


X 


iBfiJ  sse  aw  •iiaSlsa  vaai  lai/au  arf3  gnleU 
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where  h  depends  on  the  observations  only  through  t^  ,  and  p  does 
not  involve  the  parameter  A^  .  We  then  have  the  following  sufficiency 
lemma . 


*7v  VN 

Lemma  2 .  Suppose  f ^  ( x ^ )  admits  a  sufficient  statistic  t .  =  t . ( x . ) 

— — —  j  J  J  J  J 

for  j  =  1,2, ...,k  .  Then 

[  AA(2£*)dG(2i)  =  I  r  \fx(-*)dG(-)  i 

Jk  2  -  l<±<kkJja  L  J 


a 


if  and  only  if 


or 


(4.3.22)  r  X  .h^(t_)dG(A)  =  max  -j  j  Aji^(t_)dG(A) 
Jk  ^  —  1  <  i  <' 


n 


<  k  kJak 


where  h^(t)  =  J]  (t  )  . 
“  j=l  j  J 


*/V  •  ' 

Proof  j.  Since  f-^  (x.)  =  tu  (t.)  p(x.)  from  (4,3.21),  we  have 

j  J  j  J  J 


(4.3.23) 


-  nx 


* 


=  h-v(t)  p(x  )  , 


K. 

-  — p  ^ 

where  p(x  )  =  I  p(x.)  is  independent  of  A  and  i  . 


Thus 


j=l 


8 sob  I  bn*  t  i  ■  ylno  enoiJavTsado  s<il  no  ebnsqs.*  A  * 

xx,*t*„  Hut  ,ol«  I  ol  »riJ  »v.rf  no  J  »W  .  A  «*  9Vlov"  '  '' 


/*  \ 

(t*V 


=  .3 


, 

oi;Mi3P3  8  3o«l.l  iue  ‘  ij  v  /  x- 

L  t 


±-.  A*  L'_ aCTa- 


n sr  .  t.  .  I  -  t  "Ol 


j|  ^  *  >  *  >  x 


‘ 


oi  ,  1 


*x)q  (L>)/  ;  ^  /  ilit  ’  "  £ 

{£>’1  Hi.v  ,?,}• 


(ts.5.4) 


. 


auifT  .  1  bna  /'  io  Jnr  >  *c,sbn  al 


(  x)q  sisrfw 
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max  /  A.  f,(x*)  d  G(A) 
1  <  i  <  k  Jc  1  - 


SI 


=  p(x  )  max 

1  <  i  <  k 


f  \  hA(l)  d  G(X) 


from  which  the  lemma  follows 


Thus  if  we  replace  E(A.|x.)  by  E(A.|t.)  in  Theorem  8,  the 

J  J  J  J 

computations  are  often  reduced. 


4.4.  Empirical  Bayes  Procedures  when  G  Belongs  to  a  Particular 
Parametric  Family . 

In  order  to  find  the  empirical  Bayes  procedures  for  selecting 
the  best  of  k  populations,  we  require  the  following  lemmas. 


Lemma  3 1  Let  G^  be  £  one -dimensional  distribution  function,  and  suppose 

that  G  .  ,  n  =  1,2,...,  is  a  sequence  of  distribution  functions  coyiverg- 

n,  j  . . . . . ~ . . . . . .  ' 

ing,  to  G .  £t  the  points  of  continuity  of  G .  for  j  =  1 ,2, . . . , k  .  Then 
J  J 

k  k 

G  =  G  converges  to  G  =  G.  at  the  points  of  continuity 
it ,  n  n ,  j  — -  J  —  -  -  —  - - 


of  G  . 


Proof;  Let  A  =  (A^, . . . , A^)  be  a  point  of  continuity  of  G  .  If 

G(A)  4  0  >  then  Aj  is  a  point  of  continuity  of  Gj  for  j  =  l,2,..,,k  , 

k  k 

and  by  hypothesis  ]~  G  , — *■  T  G,  at  A  .  If  G(A)  w  0  ,  then 

j=l  n,J  j=l  J 

G.(A.)  =  0  for  some  j  ;  i.e.  for  j  €  K  ,  a  subset  of  {1,2, ...,k) 

J  J 
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which  is  non-empty.  Also,  for  some  j  in  K  ,  A  must  be  a  continuity 

point  of  Gj  for  otherwise  A  would  not  be  a  continuity  point  of  G  . 

Thus  there  exists  a  sequence  fG  .)  such  that  G  .(A.) — *G.(A.)  =  0  . 

L  n,jJ  n, j  J  J  J 


Therefore 


rt,  n 


G  at  A  ,  and  the  proof  is  complete. 


7V  7\ 

Lemma  4;  Let  ,  X^  ,  .  .  .  ,  be  independent  k  X  r  -dimensional  ran¬ 

dom  matrices  consisting  of  kr  components  which  are  independent  random 

variables.  Suppose  that  G  .  is  a  distribution  function  on  ft  which 
-  — ilz - n,  j - - 

-X-  •& 

is  a  function  of  X.  . ,  X _  X  .  such  that  G  .  converges  to 

- l,j  2,  J  n,j - n,j - 

G .  ,  a  distribution  function  on  ft  ,  with  probability  one  for  j  =  1,2,  . 

■¥r  ■& 

with  probability  being  taken  with  respect  to  X^  ^ ,  X^  ^ ,  ...  .  Then 


. .  ,k 


;  =  n  ■ 

*,n  j=l  ,J 


converges  to  G 


j=T- 


G .  with  probability  one  where 


probability  here  is  the  product  of  the  above  probabilities . 


Proof;  Let  B 


j  ■  (y  ■  (xi,r  x2,r  -)  :  Gn,j-"Gj  at  y}  •  Then 


by  hypothesis,  P{Bj}  =  ^  for  j  =  ..*, k  .  Using  Lemma  3> 

.c  k  k  k 

G  =  TT  G  . — >G  =  TT  G.  for  y  e  B  =  7  B.  .  Thus  P {B }  =  P{£.}  =  1 

jt,n  .11,  n,j  .11,  j  .11.  j  J 

J=1  J=1  J=1  J  =  1 


which  completes  the  proof. 


Thus  if  an  estimate  6  .  of  the  parameter  6.  of  G.  is 

n,  J  J  J 

available  such  that  the  distribution  function  corresponding  to  6  .  , 

J 

G  ,  converges  to  G.  with  probability  one,  then  by  Lemma  4 
n,  j  J 


ts. 

;  =  TT  g  . 

rt,n  n,  j 


converges  to  G  = 


A 


G.  with  probability  one.  There¬ 


fore  by  Theorem  7  an  empirical  Bayes  procedure  with  respect  to  G  is 


Y^iunilnoD  s  dd  3aum  ^A  ,  X  ni  (.  ©mo8  iol  toaIA  . yiqoTS-non  ai  rJoiriw 

io  3  o-t  v-ijtimllr  )0  8  sd  1<  n  bluow  A  ©alv  tsrilo  rr'i 

.  ;  ( .  A  r)  « — (  A).  0  isrii  rioua  .  0}  sondupae  r.  a  s  si.  iu  e/urfT 
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-X- 

obtained  provided  that  L(d^,A)  f^(x  )  is  bounded  and  continuous  in 
A  for  each  x  and  any  d^  €  ^  ,  i  =  l,2,...,k  and  that 


F  L(A)  d  G ( A )  <  00 
J  v 


a 


If  Aj- .  i  =  min  A.  ,  then  since  L(A)  =  max  [L(d.,A),d  €  o$i}  ^ 

1  1  <  i  <  k  1 

we  have,  using  loss  function  (4.2.2), 


(4.4.1) 


Therefore 


a.  .•  ic.  b 

B  .  »  . 


i  '  ...  *  -  !>  *  ' 

■  . 

«  *  b  y  .6  bn  ft  x  riot 


,  (  .Mi  ,t>;  j} 

1  •  ;  .  j 


A  II 


dioisiariT 
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(4.4.2) 


L(A)dG(A)  <  oo 


if  E(  |A  | )  <oo  for  all  i  =  1,2,  ...,k  . 


The  Bayes  procedure  for  selecting  the  best  of  k  populations 
under  linear  loss  structure  (4.2.2)  is  given  by  (4.3*20).  If  G  is 
unknown  but  exists,  and  prior  observations  X,,  X-,  ...,  X  are  available, 
then  using  Lemmas  3  and  4  and  Theorem  7  it  follows  that  an  empirical  Bayes 
procedure  for  selecting  the  best  of  k  populations  is  given  by 


(4.4.3) 


5  (x  )  =  d.  where  j  is  any  integer  l,2,...,k 

Lr  —  1 

it,  n 

such  that  E  (A,  lx.)  =  max  fE  (A.  lx,)} 
n  J  J  i  ^  ^  i  c  nx  l 1  i' J 

J  J  1  <  i<  k 


where  En  denotes  expectation  with  respect  to  ^  ,  provided  an 

estimate  E  (A. lx.)  of  E(A.|x.)  is  available  such  that  G  .  , 
n  J  J  J  1  J  n, j 

* 

the  distribution  function  corresponding  to  E  (A  lx.),  converges 

n  .  J  1  3 

to  Gj  at  the  points  of  continuity  of  G^  with  probability  one,  and 
the  conditions  of  Theorem  J  are  met. 


4.5*  Examples 

As  an  example  of  the  above  technique  consider  the  "normal -normal" 

case  where  f^  (x^)  has  a  normal  density  with  unknown  mean  and  known 

2  ^ 

variance  o\  ,  and  G^  has  a  normal  density  g^(?<  )  with  unknown  mean 

2 

and  known  variance  3^  .  We  wish  to  compute 


8i^i 


t 


(S.4.4) 


anoJtdaiuqoq  M  do  dasd  9rid  gnidoslsa  lod  mubsooiq  asyaS  ariT 

.  (0  •  -  ftl  1  S.  '  ”  rol  dBan  I 

kj’c  i{  r./s  bib  X  t...  c  anoid^vi  edo  loiiq  but.  #B.r«  9  dud  awoodiu 

y:  £>v±g  3-  otd£  ;qo<  >r  *0  d  9-  :  Id  .  i.  to*'  9iii  *90oiq 


i . S,I  i9g9dnl  yna  ai  £  sxarfw  b  =  (  x) 


<n 


X'  rr  ..  ( ^ x  ‘)ri  - 


di  >1  >1 


(£.4.4) 


X 


. 

n  t  Jt 

*  t  da  rid  rioue  aids  Hava  ai  (  x|  A)  JI  io  ( ,  x  .'  ')  3  od* .  •  .2  _■  9 
t(  x|  A)  a  od  gni bn oq a 9**00  ioldon  /}  rjoldudi*d  :  L>  &  ■: 

‘ 

■  5-  '•?  1  -I":  oadT  .  .  ano.*  d  t  >  *  .  t 


...  v  ..  :  .  $  .4 


"IaCTTon-Iamionn  9dd  *9blanoo  auplndosd  avoda  add  do  slquraxs  n a  aA 

nwon>l  ;  3B  A  na^m  nwom-  j  d J  w  vdlanju  Jannron  fi  aad  (.x)  ,d  9*srfw  esao 

1  A 

n&ata  awominu  ddiw  (^X)^g  ydlansb  iamion  a  aad  ^3  bna  t  9onai*av 


sduqn  30  od  riaiw  bW 


■  ••  • 


t  (/)  ^  &  (  .*1  /)  8  /  y  .  {  .  x  ./ 
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where 


fA.(xi)  8i(V 


8i(X1lXi)  ‘  ,  *, 

fG.(xi} 

1 


2 

Since  (x^)  is  N(V,o\)  and  the  r  observations  per  population 

i 

are  independent,  we  have  for  i  =  l,2,...,k 


(4-5-1) 


fX.(xi}  “  A  Vx^} 

1  i  =  l  1 


=  (ana2)'1/2 


exp 


2& 


(2ita\  ) 


2>-r/2 


exp 


2a-. 


f  <-ij)  -  \>2 } 


i  £=1 


[r(^  ^  (x^  -  x.)2)  +  r(x.  -  Ai)2] 

i=l 


=  (2rt(T2)"r/2  exp  j-  -^2  [r  s2  +  r(xi  “  \)2]  j“  » 


2cr. 

l 


where 


(4-5.2) 


2  1 

s .  =  — 
i  r 


I  <‘ixl  -  V2  . 

£=1 


and 


(4-5-5) 


X .  =  — 

i  r 


i=l 


x 


(i) 


Since  g^(A^)  is  N(©i,3i)  with  respect  to  some  measure  , 

using  (4.5.1) 


we  have 


"T  *■  »<  'W 


/ 


(a 


)l  .  .  K  >3  «  rarf  *w  <  s  .nvc  Ini 


<“!«)/  A  ■  <; 


X)  / 


0)-S;  ~ 


919rfW 


a 


i  a 


-  -  .x 
i 


i;  '  DE  05  }  09^8  9*  ri."  CW  I.  ,  •  «  * 
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Now 


✓  ~  \ 

fG .  Xi  " 

1 


00 


-00 


(2«cr?)"^r/2^exp  (C-L[ rs 2  +  r(x^.  f] 


2c\  1  11  1 

1 


(2jiP2)  (V2)  exp  ^ — ~2  O'^j)2  J  j"  dv± 


(4.5.4) 


(A.-e.r  1 
V  1  1 

2P2 


dvi(Vi 


1  xv-e.  \2  1 

eXp  T  (  )  ‘W/  • 


(4.5-5) 


00 


-00 


exp  -- 


,  (x.-A.)‘ 

1  l  l 


2 

cr. 

i 


exp  -  - 


A. -0 .  ,\2 

nrrWv 


.00 


-00 


exp  -- 


+  /  Vei  ^2 


2 

cr. 

l 


3, 


dv.(A.)  , 

l  i  7 


and 


(  ;A'ix  1  +  Jfc” 


qx3 


86 


Let 


and 


Substituting  (4.5*6)  in  (4.5*5)  we  get 
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.00 


-00 


exp  -  |  [  (AV  -  B)2  +  C]dv1(>'i) 


(^•5.7) 


=  e 


C 

2  \/2rt 


(J. (3  . 

i  i 


/  "ft2  2 

vr  p.-w. 
l  i 


exp  - 


(xi  ~  eJ 

P2  +  o-2 

1  1 


since 


f 

J  -00 


"v/lit 


exp  -  |  (AA.  -  B)2dVi(A.)  =  A 


Substituting  (4.5*7)  i-n  (4.5*4)  we  then  get 


fG.(xi> 

1 


r+1 

(2jt)  2  o\r  p"* 1  exp 


2 

-rs  . 


(4.5*8) 


r  o\3 . 

. 


l  (tr2+rp2)V; 


/  1  (y9^ 

V2  .2  +  P2 


{ 


(a»)  2  7+1 

(u2  +  rp2)V® 


exp 


r 

2 

r  rs .  r 
l 


(Xi"9i) 


P  P  P 

L  2o\  2(ff  +rp.) 

l  v  l  l 


MA  !xi}  ■ 


W  gt(\) 


/  *  \ 

fG  (xi} 
1 


Thus 


. 


I 

c  l  mmU  tr*  4)  ni  (T.^.4)  gi  i: JujliaduS 


\ 


-  (>l  A» 
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r+l 

(2jt)  2  o\r0^  exp 


r  rs.  r(x.-A  )‘ 
1  v  l  i' 


2o-2 

i 


2cr , 


(A .  -6 . ) 

i  iy 

2P? 


2 

■rs . 


r(x.-e.)  • 

SRI. 


} 


_1  1 
(2jt)  2(o\+r 0^{o\0  )  1 


l  i'll 


exp 


1 

2 


2  o2 

Vrpi 

a2p2 

11 


A.  ■ 


2—  2 

rpfx,+cr.e.N2 
li  ii 

2  IF 

ai+rPl 


which  is  the  normal  density  with  mean 


*00 


(4.5.9) 


\  8, (A. |x  )dv. (A  )  = 

-oo  o\+rj3 


2-  2 

rp.x.+cr.e. 

r-J  1  • 


i  i 


If  we  assume  that  6 ^  is  finite,  then  by  (4.4.2), 


Also, 


f  L(A)  d  G(A)  <  oo  . 

J  lc 


ft 


K 

(4.5.10)  L(d.,A)  fA(/)  =  (A[k]  -  At)  J]^  V(xj) 


-  M(A[k]  -  v  exp  (•?  y( 

j=i 


x  .  -  A . 

_J _ 1 


where 


^  r 

M  =  f[  (2jtcr2)  2  exp  (- 
j=l  \ 


rs 


J. 


2.2 

J 


X 
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is  constant  with  respect  to  A  for  fixed 


* 

x .  and  where 
3 


2 

s  . 
J 


r 


£=l 


from  before . 
* 

x  and  any 


Then  (4.5.10)  is  bounded  and  continuous  in 

di  «  cS  i  • 


A 


for  given 


If  r  =  1  in  (4.5.8),  it  follows  that  the  random  variable 

2  2 

X.  has  a  normal  unconditional  density  with  mean  6  and  variance  a  +  R 
J  J  J  J 

and  thus  the  nr  prior  independent  observations  on  provide  a  suitable 

estimate  of  6.  .  Define 

J 


(4.5.11) 


t 


the  overall  average  of  nr  observations  on  X^  where  j  =  l,2,...,k  . 

Si  s 

By  the  strong  law  of  large  numbers  6  . — - — *  6.  so  that  defining  G 

n,J  J  n,j 

by  G.  with  6.  replaced  by  6  .  ,  we  have  that  G  . — >G.  with 

J  J  n,j  '  n, j  j 

probability  one.  Thus  using  theorems  7  and  8,  Lemmas  5  and  4,  (4.4.5)# 

and  (4.5.9)  we  see  that  an  empirical  Bayes  procedure  for  selecting  the 

best  of  k  populations  under  the  linear  loss  function  (4.2.2)  is  given 

by 


(4.5-12) 


)  (x  )  =  d .  where  j  is  any  integer  l,2,...,k  such 

G  —  j 

it,n 


that 


Q2-  2= 
rp  .x.-hj.x. 

3  J _ J__1  = 

2  2 

a ,+rp .  1  <  i  <  k 

J  J  “  ~ 


max 


2—  2= 
rp  .x.-kj.x. 
11  11 

2  D2 

Cri+rPi 


. 


iBril  »vari  sw 


,  jvj;,  ai  (S.S.4)  10.  JO,  8  »-H  •<**  anoUaluqoq  *  o  Jae.l 
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In  a  similar  manner  Deely  [2]  derives  Bayes  and  empirical  Bayes 
procedures  for  selecting  the  best  of  k  populations  for  the  following 
cases . 


2  2 

(l)  Normal -uniform.  f->  (x.)  is  N(7v.,cr.)  where  cr.  is  known,  and 

J  J  J  J 

G.  is  the  distribution  function  for  a  uniform  density  on  (6 . -a . , Q ,+a . ) . 
J  J  J  J  J 

The  Bayes  procedure  with  respect  to 


G  = 


k 

fJl 


G. 

J 


where  the  Q s  and  a.Ts  are  known,  and 
J  J 


a. 

J 


where 


(e .+a  .  -X  . ) 
J  J  J 


and 


(0  .  -a  .  -x  . ) 
J  J  J 


r 


£= 1 


is  given  by 

(4.5.13)  5g^-  ^  =  dj  where  j  is  any  inte8er  1 ,2, . . . , k  such 


that 


cp((B1)-Cp(ai)  rx. 

J  + 

J  j 


max 

1  <  i  <  k 


f  'P(Pi)-tp(a.) 


) 


where  Cp(u)  is  the  standard  normal  density  function,  and  $(u)  is  the 

standard  normal  distribution  function.  Assuming  that  a^.  is  known,  and 

0.  is  unknown  but  finite  for  j  =  l,2,...,k  ,  the  empirical  Bayes  procedure 
J 


gnlwollo*  sitr  10I  anoiJBluqoq  *  So  Jesxi  9/to  gnUodlsa  to*  aaiu bs> 


\t 


»V 


-  -|  .v.  <-r~  • "— r*y  .1  iiij 


i  i  j  ;  ni  v  *'  A  *  no. I  iom/S  iic.  -  i»d  T3»lb  It  icn  :  ab  1 
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based  on  nr  prior  observations  and  which  is  asymptotically  optimal  to 

5  is  given  by 
G 


(4.5.14) 


/  \ 

)n  (x  )  =  d.  where  j  is  any  integer  l,2,...,k  such 
—  i 

it,n 


9(3' )-cp(a« )  rx 

that  +  a 

J  J  O’. 

J 


max 

1  <  i  <  k 


r  rxt 


1 


+ 


0(a! )  -<I>(3  ! )  '  _2 
'  1  1  cr, 


where 


a\  = 


r 

2 

cr. 

J 


1/2 


(x.+a .-x. ) 
J  J  J 


and 


pj 


r  \1//2/=  -  \ 

J)  (xj-aj-xi) 

j 


for  j  =  1,2,  ...,k  ,  where 


x . 
J 


nr 


n  r 


i=l  £=l 


(i) 
x.  . 
ij 


(2)  Binomial-beta. 


w ■  (? )  "j1  (l-v 

J  xj 


u  .-x. 
J  J 


and 


where  u  is  the  number  of  trials,  x.  is  the  number  of  successes, 

j  J 

X.  is  the  probability  of  success  on  a  single  trial.  G.  is  the  distribution 

j  » 


o3  li  i  3<  yl  Tool ic JqfiTyai  Hi  rioXrhr  2  -in  t tdavisrlo  ioi  "x  tn  no  baend 


diariw  »  (  x) 


bn  s 


9 iftrfw  *  *  t 


.  ftlad-Xaloronlfl  (S) 


1  [/  C‘,>- ‘‘V 
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0.-1  v.-l 

function  for  a  beta  density  g.(A.)  =  c.A.  J  (l-A.)  J  ,  where  v. 

j  j  J  j  J  J 

and  6 ^  are  non-negative  parameters,  and 


r(v.+«.) 

JLJL 


cj  r(e,)r(v77 


The  Bayes  procedure  for  selecting  the  best  population  under  the  linear 
loss  function  (4.2.2)  when  and  6 are  known  is  given  by 


(4.5.15) 


*. 


5  (x  )  =  d.  where  j  is  any  integer  1,2, ...,k  such 
b  ~  J 


that 


x  +  -  0. 

...  1  ..  r.-,J  = 

u.  +  -  v,  l<i<k 
j  r  j  -  - 


max 


x.  +  -  Q. 
i  r  i 


u.  +  -  v. 
i  r  i 


Assuming  that  is  known  and  6 ^  is  unknown  but  finite  for  j  =  l,2,...,k  , 

and  empirical  Bayes  procedure  based  on  nr  prior  observations  is  given 
by 


(4.5.16) 


5  (x  )  =  d.  where  j  is  any  integer  l,2,..,,k  such 
_  J 

1  ^j_  =  1  i  = 

j  r  u  j  f  1  r  u.  i  | 

- r1—  =  max  ] - i -  r  > 

u.  +  —  v.  1  <  i  <  k  ^  u.  +  —  v. 


that 


J  r  J 


l  r  l 


where 


n 


x.  = 


JL 

nr 


x 


(i) 

ij 


i=l  £=l 


is  asympot ically  optimal  to  5  . 


rt,n 


. 


. 

. 
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(3)  Po is son-gamma . 


w  ■ 


-X.  X. 

e  J  X.J 

1 


x . ! 


for  x.  =  0,1,2,...  ,  and  X.  >.  0  ,  and  G.(X.)  is  the  distribution 
J  J  J  J 

function  for  the  gamma  density 


6.  0,-1  -A. a 

g  .  (X . )  -  -J — i 

j  J 


a  j  \  i  ^  j  j 


in  which  and  0^  are  non-negative  parameters.  Then  the  Bayes 

procedure  with  respect  to  G  for  selecting  the  best  population  under 

the  linear  loss  function  (4.2.2)  and  when  a.,  6.  are  known,  is  given 

J  J 

by 


(4.5.17)  )  =  d.  where  j  is  any  integer  1,2, ...,k  such 

G  —  J 


rx .  +  6 . 
that  — ^ ^ 


max 


rx,  +  6. 
1  1 


r  +  a.  .  .  .  .  ,  r  +  a. 

j  1  <  1  <  k  ^  1 


Assuming  that  a.  is  known  and  6.  is  unknown  but  finite  for  j  =  1,2, ...,k  , 

j  J 

the  empirical  Bayes  procedure  which  is  based  on  nr  prior  observations 
and  is  asymptotically  optimal  to  5  is  given  by 


(4.5.18) 


j  (x  )  =  d .  where  j  is  any  integer  l,2,...,k  such 
G  —  j 

it,  n 


rx .  +  a .x . 
that  — ^ ^ = 


max 


rx.  +  a.x. 
1  11 


r  +  a.  .  ,  .  ,  1  r  +  a. 

j  1  <  1  <  k  k  1 


rW-1"  -  W 


y^iijnab  bramn:  arid  *rol  rioi:J3njj3 


•ssb(  u  no ’■  3*I;jqoq  i  b ec1  stii  $r;  ■  1  j!*b  icI 


' 

a  +  ,xi  > 
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4.6.  Empirical  Bayes  Procedures .  The  General  Case . 

Suppose  we  drop  the  assumption  that  the  a  priori  distribution 
G  is  a  member  of  some  particular  parametric  family,  and  assume  instead 
that  G  e  where 


(4.6.1)  x $  =  {G  :  J  L(di,A)d  G(A)  <  °o  for  . 


a 


If  we  define  A  (d.,x  )  by  (4.3*4)  and  A.  (x  )  by  (4.3*7)  such  that 

G  i  —  i ,  n  — 


•*, 


A^  ^(x  ) - *  A^(d^,x  )  ,  then  the  sequence  of  procedures  D  =  {5^} 

defined  by  (4.3*9)  is  asymptotically  optimal  relative  to  G  if 


/  L(A)  d  G(A)  <  oo  . 

J  k 

a 


From  (4.4.2)  the  above  condition  will  be  satisfied  for  G  e  where 


(k.6.2)  j£f  =  {G 


:  G  ■  fi  Gi 

j=l  J 


and  G.  is  a  distribution  function 
J 


on  a  such  that  /  A.  d  G(A.)  <  o°  for  j  =  1,2,  ...,k] 

-'a  J  J 


Using  loss  function  (4.2.2)  we  have 


(4.6.3)  a  (di,x*)  =  r  [L(di,A)  -  L(d1,A)]  f^(x*)d  G(A) 

ak 

=  J  (\  "  \)  f^(x  )d  G(^) 

ak  ~ 


n  b  .  1  It,  «  ■  >'  '  '  )i3f  -  ,•  «  irfS  qoit  « 


- 


.  (>!,  8  0  '■  ’  "  •’  ■  /  ... 
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for  i  =  2,3, . . ,,k  .  Now 


fk  Vxfe*>dG&>  ={  \fx  (x*)dGi(V 


fX  (xi)dGi(Ai) 
j  J  J  J  - 


If 


5'i(xI)  =/0  Vx/x*)*iX: 

0  6  1, 


and 


fG  ^  =  [  fX  (xt)dGi (^4 )  > 

i  J  j  J  J  J 


then  from  (4.6.3)  we  have  for  i  =  1,2,  ...,k  , 


(4.6.4)  AG(di(x*)  = 


n  fc  ^ 

j=2  j  J 


(7i(x*)  jt*Gi(x*)J 

J  =  1  J 

J41 


If  we  can  find  functions  f  .  (x  )  and  y  ,(xj  of  the  prior  observa 

n, y  y  'n,j  y 

tions  such  that 


(4.6.5) 


,  *.  p  ,  *. 

fn, j  Xj  *  fG.(Xj>  ’ 


and 


(4.6.6) 


,  *\ _ P  ,  *n 

^n,  j  Xj  ^(xj} 


for  j  =  1,2, . . . ,k  , 


then  defining 


woW  •  ^  •  •  •  •  * '  < 


(/)l3b(tx)  (Vr,=  ^ 


(4l.S.d) 


.  «>)  0i ;  (>)*x)  -  - 1  *»ki* 

anoint  -  bf’ll  ^  J  s 


s  w  *1 


3»rii  rfaut  8«ol3 


■ 


<  ‘V 


j  tn .;-  isb  n  ^  i •  •  •  ‘ 
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(it. 6. 7) 


c  k 

(x  )  =  [y  (x. )  7  f  .(x.)}  -  ( 7  .(x.)  TT  f  .(x.)} 

i,nv-  '  1 '  n, 1 '  V  U  n,jv  j'J  l/n,iv  n,jv  j'J 

J41 


we 


P 


have  ^(x  )  — A^(d^,x  )  for  i  =  l,2,...,k  ,  and  an  empirical 

Bayes  procedure  will  be  given  by  (4.3*9) • 


Due  to  the  independence  of  the  observations  we  have 


(4.6.8)  fG  <**)  =  ^fG  (Xj(,))  ■  nx  {  fa  })dG.(A.)  }  . 


The  random  variable  X..  has  the  same  marginal  density  f_  (x.)  for 

ll  G .  j 

J 

each  i  =  l,2,...,n  ,  and  therefore  we  can  use  the  prior  observations 

as  well  as  the  present  observation  to  find  an  estimate  of  f„  (x.)  at 

G .  j 

(1)  (2)  (r)  J 

the  r  points  x^  >Xj  >***>x^  •  Defining  the  empirical  distribution 

function  for  the  j  '  th  population  to  be 


(4.6.9) 


F  .(x^)  =  ,  1  v 
n,J  J  (n+1 )r 


(total  number  of  prior  observations 


(i) 


from  the  j  '  th  population  which  are  <  x^  '  )  , 


we  have  from  (2.3*17)  anc*  (2*3*16)  that 


(4.6.10) 


f  .(x9}) 

n>  J  J 


 n; J—  i 


(x(^  +  h  )  -  F  .(x^ 
-  n  n,jv  J 


hn) 


2h 


n 


,-(1/5) 

and  j  =  1,2, ...  ,k  .  Since  (4.6.10)  implies  that 


where  h  =  dn  d  >  0  being  some  constant,  for  &  =  l,2,...,r 

n 


' 


)  S'  1 


.<«.«.*)  xd  t*  •  i  '«  •“';rS 


‘ 


^  X 


«,  /  K)  i  lo  9l»alJ»9  "®  bn«  aomviM**  3fl»e®-«q  «&  «®  :  aB 

i  ,.j,  d  -rjelb  •*  3"  « 


«i  o:.  I  oi*»Ii;qOq  r  '  t 


anolJRviaado  10.  rq  io  •'sdmK1  t-RjuJ-‘  t(  I+n  j  '  t  ltn 
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(4.6.11) 


for  j  =  l,2,...,k  ,  a  sequence  f  .(x.)  in  (4.6.7)  can  always  be 

n,j  j 

found.  Therefore  we  are  left  with  the  problem  of  finding  a  sequence 

-X- 

{7  .  (x.)}  which  converges  in  probability  to  {7-(x.)}  •  if  such  a 

n>  J  J  J  J 

sequence  can  be  found,  then  the  empirical  Bayes  procedure  5^  is  given 

by  (4. 3*9)*  Since  f  .(x.)  >  0  ,  we  have  from  (4.6.7) 

n>  J  J 


(4.6.12) 


7  t  ( Xi )  7  . (x„ ) 

'n,  lr  l7  7n,  17 

f  ,  (x  )  f  .  (x.  ) 

n,l\  l7  n,iv  17 


f  *N  f  'njl'^l'  7n,iv“i'  (  ^ 

)  =  -!  — J 1 — ¥ - 2 — —  r  J*  )  > 


jt,n  — 


where 


(4.6.13) 


:  (x*)  =  TT  f  •(*?) 

«»n'-  A  n,  j  j' 


Now  (4.6.13)  is  independent  of  i  so  that 


(4.6.14) 


A.  (x  )  =  min  A.  (x  ) 
l,n 7  ,  .  .  .  ,  1, nx—  7 

J  7  1  <  1  <  k  7 


if  and  only  if 


(4.6.15) 


y  .(x.)  c  7  • (x. ) 

^  -  -*  { -^77; 

fn,j(xj)  1^i-k  lfn,i(xl} 


Thus  the  empirical  Bayes  procedure  with  respect  to  any  G  e  for 

selecting  the  best  of  k  populations  is  given  by 


»d  eyswls  n*3  (  •  ,  »3«*up»»  •  ,  * . M  't  loi 

9  8  jnlbo.ll  lo  ID.. I  oyq  aril  riatw  /vsX  918  aw  •  -»uo1 

•  -i  )  -;]  os  yJ  i*‘  nl  esgiav.ioD  liaidw  ( (  .  . ■  . 

■ 

(y.3.4)  morl  avert  aw  ,  0<  '  ,*)  •  .•.<■.  v  X'1 


y'rsrfw 

(fi.a.*1) 

lo  U  neqdbni 

al  (c wo Vf 

* 

*  (  *)„  fA 

. 

*  /*  \ 
x’V'  j 

xam  =  •  • :  - ' 

,j  ft  •  rfolv  t  »L'ba  OT:  asysa  ItoOJ  -  t<;i 

y-f  r  .vlg  al  i  lolisluqoq  d  iead  *riJ  J 391 93 
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(4.6.16) 


6>n(x  )  =  dj  where  j  is  any  integer  l,2,...,k  such 


7n  1(x^ 

that  ^ — - —  =  max 

f  .  (x. )  1  <  i  <  k 

n,r  J  -  - 


f  . (x. ) 
n,iv  l7 


provided  we  can  find  a  sequence  [y  .(x.)}  which  converges  in  probability 


n,  l  l 


to  [7i(xi))  . 


As  an  example  of  the  above  technique,  consider  the  case  where 
we  have  the  class  densities  given  by 


x 


V  g(x  )  h  (A . )  ,  x  >  a  , 

(4.6.17)  f^  (x.)  =  -j  J  J  J  J 


0 


'  Xj  -  3  ' 


where  a  is  some  constant  and  where  A.  is  distributed  according  to 

G.  such  that 
J 


K 

-  n  g- 

ak  j 

j=i 


$ 


Then 


7 .  ( x* )  =  f  A.  f,  (x*)  d  G  (A.) 
J  J  Ja  J  \  J  J  J 


I  A  A 


F  (i) 

iixj 


r-  r 


n 

J 


(i) 


[h (V  )  ]  dG.(A.) 


(4.6.18) 


gu|1}) 

—  ** 

g ( x j 1 ^  +  1) 


ft 


x(1)+i 

r  A.j  g(x(l)+l)h(A.) 
J  J  J 


I 

>2Xj 


isjgsJnl  yna  ai  t  .  b  =*  x)  3 


(*x)».nM  __  *t,}L- .  .... 

yji lidadoiq  ni  aagiavnoo  dojtriw  ((.x)  ^  9anaupaa  a  bnix  nao  ev;  bebivoiq 


9'x.  A\  9R6D  9  3  labrs  ior>  >v\  hariDyl  9V  .  .  lc  iqrBX#  n&  e\ 


yd  new  asrJiaosb  asslo  adj  avad  sw 


»  >  .  :t  ,  0  J  L 


X 


vr  j  *  r  j  n  *  ° 


nsriT 


i  >  • 


'■  <  ' r[x)g 


(8l.d.4) 


•  n  g(x-i^)(h(Ai))r"i  (xj 

2  J  J  J  J 

g(x-^)  r  m  fr 

=  — fry—  /  f*  (x . 1  +1 )  M 

g(x(  ^+1)  j  J  ^=2 


.(1) 


fX4(xjU))  dW 


—  |1y)  ~  fG  (^1}+l)  n  fG 

g(x^+l)  Gj  J  /=2  Gj  j 

•M1’)  S'”"1*1’  ,,, 

sfe  ■  <0  (.|‘)|  ■  vx’’  - 

J  Gj  J 


U) 


where  xA  '  is  arbitrary  and  determined  by  convenience  only,  and  where 


•  ^  -  ft  fG  -  f  ft  fA  («|i))  «*.(*. 

'j  J  m  Gj  J  - 'Si  /=!  Aj  J  j  J 


)  . 


(e) 

Since  the  function  g  is  known,  and  f  . (x.  ')  defined  in  (4.6.10) 

n,  y  J  v 

(o)  * 

converges  in  probability  to  f  (x;  ')  ,  then  .  (x.)  defined  by 

j  <3  ^  j  J  J 

J 


(4.6.19) 


*  s^-1)) 

7  .(xT)  = - 3 - 

n>J  J  (1) 

g(xj  +1) 


f  (x^+l)  JJ  f  .(x^;) 
n,J  J  /J-2  n,  J  J 


.U) 


converges  in  probability  to  y.(x.)  .  Writing 

J  J 

gfc1!1))  fn  i(x*1^+i) 

 nJ  j  j 


(^.6.20)  7n,j(**)  =  {— 


sCx-^h-1) 


where 


<(0tV  i  V  ’ 


bn.  ,yJno  HHlwnw  \d  b»i*n»d.b  k»  t  £  '= 


(01. <M)  ni  b*x«9b  (  ••;*),  ,»  bns  .flwoiul  .1  8  »««"  J  " 
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and  since  f  .  (x^)  >  0  ,  we  see  that  the  empirical  Bayes  procedure 
^  y  J  <3 

for  selecting  the  best  of  k  populations  with  respect  to  any  G  e 
and  using  loss  function  (4.2.2)  is  given  by 


(4.6.21) 


5  (x  )  =  d.  where 
n  ~  J 


j  is  any  integer  l,2,...,k  such  that 


g(x-1^)  fn  ^(x^+l) 

II  < 


A- 


]  j 


nr 


g(xj  +1)  £n, j (xj  > 


max 

1  <  i  <  k 


g(x[^) 

g(x[1,+l)  fn,i(xii/:> 


n.i(xil)+l)l 

f  (M)\) 


After  a  suitable  transformation  a  large  class  of  densities  can  be  written 
in  the  form  (4.6.17). 


^  1  ..  '  •  •  •  f 

■ 


u  r.  i,  .,S  I  *«*=■>*  X.-  -  »  t  *»*“  tb  - 

(Di 


rtU(l'lx),  ni  ('>)* 

••  ~  *  ; 


in**)*  n*  (VJtx) : 

J  w- 


r.Tir  5  j  >  i  >  x  ((IK <*♦  *'> 


80i  Ua*b  lo  aseJo  3S«X  *  nolJMiToi.o.-r 3  9id.3iua  s 

,(yl  a. 4)  Hioi  9  v;  nl 


\ 
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CHAPTER  V 

NON -PARAMETRIC  EMPIRICAL  BAYES  ESTIMATION 
AND  HYPOTHESIS  TESTING 

The  non-par ame trie  case  occurs  when  the  class  of  conditional 
probability  distributions  of  X  given  A  is  not  restricted  to  a  particular 
parametric  family  but  is  a  much  larger  class  of  probability  distributions 
which  cannot  be  defined  in  terms  of  a  finite  number  of  parameters.  In 
the  discrete  case,  for  example,  it  may  be  the  class  of  all  probability 
functions  assigning  probability  one  to  some  specified  denumerable  set 
of  numbers  X  =  [x]  •  Let  =  {F^(x)  •  w  €  ft]  ,  the  class  of  all 

conditional  distribution  functions  such  that  probability  one  is  assigned 
to  X  where  ft  =  {00}  is  an  abstract  indexing  set.  Let  (j.  be  the  a 
priori  probability  measure  defined  on  (ji  ,  a  cr-algebra  of  subsets  of 
ft  ,  and  let  Y  be  an  ft-valued  random  variable  which  is  the  identity 
mapping  of  ft  onto  itself. 

5.1.  Estimation:  Discrete  Case. 

Consider  the  random  variables  X-,X_,...,X  which  are  condition- 

r  2’  r 

ally  independent  and  identically  distributed  with  common  conditional  dis¬ 
tribution  function  F  (x)  given  that  Y  =  00  .  Define  the  random  variable 

GO 

A  for  a  given  measurable  function  h(x)  by 


(5.1.1) 


A  =  A(Y)  =  E (h (x) 


Y)  , 


. 

...  •  ?  r  if  "i !  ,N  ; 

anolludiiisib  ylind»doi<|  o  a.  »io  *•*«!  *  Ki  3ud  x 

ot  .vio  mb-*  In  isdr  j<>  9:1  oi  •  »o  «*>«  ni  1  i9b  **’“•* 

yllUdedoiq  lie  lo  tui)  aril  ad  yara  1*  ,*H»**»  ~1  ,»**>  »»•«' ' ;  flr:  ’ 

!«,  aiaadoa  16  BTdagla-T,  .  ,  £  no  b»««9‘:  9tta‘  1 

)  *  '  .  •* 

•noJtdlboo.o  »«•  iteldw  X, I  .aldar  >v  mobn..i  a  .3 

pldaiiav  «*n..X  aril  onllsa  .  »  -  Y  laril  n«*g  (*)„*  Bollonol  nollodlll 

yd  (*)ri  nollonol  sldatuasara  navis  a  iol  A 


(X.I.?) 
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where  X  is  a  generic  representation  of  the  X  1 s  ,  and  assume  that 
the  a  priori  probability  measure  space  (ft,&,|j.)  is  such  that 

(5  •  1  *2)  Eh2  (X)  <  00 


Using  X  =  (X^ , X^, . . . ,  X^ )  ,  the  vector  of  observations,  we  wish  to 
obtain  an  estimate  of  the  value  A  of  A  .  If  ^/(x)  is  an  estimator 
of  A  ,  we  will  use  the  usual  squared  error  loss  function 

(5.1.3)  L(^(x),X)  =  (V'(x)  -  ^)2  , 


where  x  =  (x^,x^, . . . ,x  )  .  The  risk  involved  in  using  any  estimator 
^(x)  is 

(5.1.4)  R(y)  =  EL(V/(X),A)  =  E[^-(X)  -  A]2  . 

From  (5.I.I)  and  (5. 1.2)  we  see  that 

(5-1.5)  EA2  =;E{E2(h(X)  |  Y)  )  <  E(E(h2(X)  |Y)) 

=  Eh2  (X)  <  co  } 


and  hence 


(5.1.6) 


R (VO  ='  E{E  ty(x)  -  A)2  |  X  =  x) 

=  e{[^/2(x)-2^(x]e(a|x)  +  e2(a|x)]  +  e(a2|x)  -  e2(a|x)) 
=  E{[^(X)  -  E (A  |X)  ]2  +  E(A2|X)  -  E2(A|X)) 

=  E{[^(X)  -  E(A|X)]2}  +  E [Var  (A|x)} 


This  is  a  minimum  when  ^(x)  =  E(Ajx)  and  hence  the  Bayes  estimator 


'II 


tt  ■)  i  ;j;2  r  fc^a  ,  a'.X  s>r!J  *o  noiJ  .rinse  aqs  olWdg  4  al  X  arsrin 

9D«qe  9*u*B0nr  ylilldBdoTq  loliq  »  sd^ 

.  to  >  </)  to I  (S.I.5) 

■ 

oJ  ri?.iw  >v?  taao23ftvi98do  lo  t  (  jX,. . .  ,f  <  ,a  i  =  a  gn.'  a 

aolsv  srii  lo  sdamiUBo  nfl  nisido 

■or  A  r  'B  J  •  <  ' 

toj  w  xn6  gnlao  r  b9viov  all  ‘ 

^  \ 

.  '[A  -  (X)'J(Ji  =  (A,(X)<v)J3  =  ('K)a  (<*.1.?) 

jaris  991  9W  (S.I.^)  bna  'I,f.;  moi%; 

x  .  I  I  (A  -  (X)*)S)3  =  ('K)fl 

[(X|A)2J  -  (X;|SA)  ■  .  ••  (X|  • 

{(XI  a)  a  -  (x|  :a)s  +  [{x|A)a  -  (x)  -flja  -  ^  .  H  ,  , 
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(x)  which  minimizes  R(^/)  is 
M* 

(5.1.7)  f  (x)  =  E(A|X.=  x) 

M* 

* 

for  all  x  e  X  =  [x  :  P(X  =  x)  >  0]  ,  and  from  (5. 1.6)  the  risk  of  the 
Bayes  estimator  is 

(5.1.8)  R(^  )  =  EA2  -  E^/2 (x)  <  00  . 

(j.  — 


As  before  the  a  priori  probability  structure  of  the  problem 
must  be  known  before  we  can  obtain  the  Bayes  estimator  in  (5.1.7). 

Assume  that  this  structure  is  unknown  but  we  have  additional  information 
in  the  form  of  vectors  of  observations 


X.  =  (X.  ,  X.  ,  ...,  X.  )  for  i  =  1,2, ...,n 

—1  i,l  i,2*  i,r+l 

where  the  X^ ' s  are  mutually  independent  and  independent  of  X  and 
where  for  each  i  the  X^  ^’s  are  conditionally  independent  and 
identically  distributed  according  to  F  (x)  given  that  Y.  =  go. 

J  GO.  7  11 

1 

where  ,  i  =  l,2,...,n  are  mutually  independent  of  Y  and  have  the 

same  distribution  as  Y  . 


If  we  let  X<r)  =  (X.  ,,X.  0, ...,X,  )  ,  then  X.^  and 

— i  '  1 , 1 7  l , 2  7  1 , r /  7  —1 

E(h(X  , )|Y.)  have  the  same  joint  distribution  as  X  and  A  so 
v  v  i,r+l  1  1  — 


that 


39siininiflT 


loiiq  B  9*  siolad  e 

.(T.i.5  i  K,r-nlJ«»  89X68  9ril  fllBldo  060  «w  910 J od  nw 
no  rtnJ :  X-ooi  3  ibi>B  evurl  9w  iud  nw  or  in  si  siiMamita  « 


. 


=  Y  ia.r.fisvi  x  oJ  arubTooJB  baSud /*:)  aib 


03  A  brrs  X  as  noiiJudii Jeib  ztnlot  9m6a  9dJ  9vsd 


-  104  - 


(5.1.9) 


E[h(Xi,r+1>l4r)  -£] 

=  E{E[h(X.^+1)|Y.,x{r)]|x[r)  =  x) 

=  E{E[K(Xi)r+i)|Yi]  |xjr)  .  xj 

=  E(A|X  =  x)  =  f  (x)  . 

H* 


This  suggests  the  following  empirical  Bayes  estimation  procedure*. 


If  x  =  (x  )  ,  then  for  each  x  we  can  permute 

the  components  to  get  m(x)  distinct  vectors  with  1  <  m(x)  <  r!  .  If 
X(  )  ,  q  =  l,2,...,m(x)  denote  the  distinct  vectors,  then  define  the 
random  functions  M^(x)  ,  i  =  1,2, .  ,.,n  and  Mn(x)  by 


(5.1.10)  M.(x)  = 


(r ) 

1,  if  there  exists  a  q,  1  <  q  <  m(x),  such  that  ' 
0,  otherwise  , 


and 


(5.1.H) 


Mn(x)  =  ^  M.(x) 

i=l 


If  the  empirical  Bayes  estimator  is  defined  by 


r  i 


(5.1.12)  ^  (x)  = 


M  (x)  - 

n  —  i=l 


u 

V  M. (x)  h (x.  ) 

/  i  — '  '  i,r+ly 


0 


M  (x)  >  0  , 

n  — 


otherwise  , 


then  Johns  [6]  proves 


(*). 


C*  *  (*fl! 


.  >» 


Oiq  .giMUM  »»X»«  »»1WOn0i  9'iJ  ,,M**  *m 


tJ  ,  >  *).  >  !  Itai«  »0>o«*v  3«il3«lb  (X)«  388  »3  Onwoq-oo  * 


(tl.i.5) 


i  ol<  13  »»vsf  i:  ■  3  J: 
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Theorem  6:  Let  X  b£  a  generic  representation  of  r  >  1  independent 

random  variables  which  are  identically  distributed  according  to 
F^(x)  e  ’  an<*  let  there  be  a  measurable  function  h(x)  such  that 

(5.1.1)  holds .  Let  X  be  the  vector  of  the  present  r  observations , 
and  let  X^  ,  i  =  1,2, .  . . ,  n  be  the  prior  observations  where  the  X^  *  s 
are  mutually  indepe ndent  and  independent  of  X  and  where  each  X^  con¬ 
sists  of  r+1  independent  random  variables  which  are  identically  dis - 

tributed  according  to  (x)  €  ,  Lf  (£1,  CL  , (j.)  is_  such  that 

i 

(5.1.2)  holds ,  then,  using  loss  func  tion  (5.I.5), 

(5.1.15)  lim  Rty  )  =  Rfa  )  , 

n  ->oo  n  ^ 

where  R(^/  )  i®.  the  risk  of  the  Bayes  estimator  and  R(^)  i_s  the 

risk  of  the  empirical  Bayes  estimator  if/  defined  in  (5.1.12). 

Instead  of  proving  the  above  theorem,  we  will  prove,  in  the 
following  section,  the  above  result  for  a  slightly  different  case;  namely 
the  supplementary  sample  method  of  Krutchkoff  [8]  which  was  derived  from 
Johns’  non-parame tr ic  result  by  dropping  the  requirement  that  an  unbiased 
estimate  of  A  be  known  at  the  time  the  estimate  is  to  be  made.  The 
reader  is  referred  to  [6]  for  Johns’  original  proof. 

5.2.  Supplementary  Sample  Method . 

Suppose  that  the  random  variable  X  has  an  unknown  conditional 
distribution  function  F^(x)  given  A  =  A  which  may  be  specified  by 
an  unknown  probability  mass  function  P^(x)  (i.e. 


X  is  a  discrete 


. 

.  . 

■  •  -  ' 

. 

. 

*  .  f  1  oJtla'ii  i  .  )I  ■  ll  :>  -  (  *' 
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random  variable).  One  is  required  to  use  the  observed  value  of  X  to 
obtain  an  estimate  of  X  which  has  distribution  G(A)  ,  a  specific 
but  unknown  member  of 

(5.2.1)  =  (g(a)  :  ea2  = 


/ 


A2  d 


G(A)  <  oo  ] 


Suppose  that  in  our  replications  we  have  a  supplementary  sample 
y  which  is  the  realization  of  the  random  variable  Y  with  conditional 
distribution  function  H^(y)  given  A  =  A  ,  a  specific  but  unknown 
member  of 

(5-2.2)  yf  =  (H^(y)  :  E^Y  =  J'  yd  H^(y)  =  A  ,  and 

EY2  =  J y2  d  H^(y)  d  G(A)  <  ~  )  . 

Thus  the  expectation  of  Y  is  the  realization  of  A  for  that  replication 
and  not  necessarily  the  present  value  of  A  .  Thus  after  an  action  has 
been  taken  we  can  observe  with  error,  perhaps  much  later,  the  value  of 
A  .  For  example,  we  may  observe  later  the  consequences  of  the  use  of 
some  product. 

Assume  that  X  and  Y  are  conditionally  independent  and  that 
the  replications  of  the  problem  are  independent  and  identically  distributed. 
Then  the  joint  distribution  of  (X,Y)  for  A  =  A  is  given  by 

(5-2.3)  Fa(x)  HA(y)  , 


•  j  '  ■  •.  .  ;  *  ifi'-  -  ■  i 

t 

.  (  *  >  (A)D  b  \\  ^  -  A3  (X  i)  >  <  (I  S-?) 

-  '.:>  1  SYJ'i  rv  i  --  :  ■  ..  9<u>qqu8 

■  .■  >:  f  I  <:  -  ai  rfairiW  v 

. 

Ca)  >  'u)j  i.  -  i 

3  'it 

.  .  fq  ,  .  i  '  I  hr;  B 

a  ^  ,  £•  ’  .-I  >  b  r  evT9!  .-‘o  nr.o  w  nsiaa  4  »s 
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and  if  (A_^,X^,Y^)  represents  the  i’th  replication,  then 

(A-  ,  X  ,  Y  ,A  X  ,  Y  ,  .  .  .  ,A  ,X  ,Y  )  has  the  distribution 
L  L  L  d  d  2  n  n  n 

(5.2.M  I  G(\)  Fx7i)  \>i)  • 

1=1  1  1 

From  (5.1.6)  with  r  =  1  ,  we  saw  that  the  estimate  of  A  which  minimizes 

p 

the  expected  squared  error  R(^)  =  E(^/(x)  -  A)  ,  where  ^/(x)  is  an 
estimator  of  A  ,  is 


(5.2.5)  ^  (x)  =  E(A  |  X  =  x)  , 

H* 

and  from  (5. 1.8)  the  Bayes  risk  is 


(5.2.6) 


R if  )  =  E  [Var  (A  |  X))  <  00  . 

M- 


The  "regret"  is  defined  to  be 


(5.2.7) 


rO)  =  R(f)  -  R(i  )  =  E[^(X)  -  E  (A  |  X)]‘ 

r* 


Assuming  that  E(Ajx)  is  not  known  to  us,  we  wish  to  use  the 

2n-tuple  of  values  z  =  (x, ,y, ,x_,y_, . . ,,x  ,y  )  ,  the  realization  of 

~n  11  2  2  n  n 


(5.2.8) 


Z  =  (X^Y^X^Y  ...,X  ,Y  ) 
~n  H  V  V  27  2  n ’  n 


for  n  previous  occurrences  of  the  problem  in  order  to  obtain  an  estimate 

U/  (z  :x)  of  A  .  Note  here  that  the  expectations  must  be  taken  over 
’nvv/n’  ' 

Z  as  well  as  X  and  A  from  now  on  since  r (ip) 
o^n 


is  an  unconditional 


1  *»  : r  '  u  hne 


*>x*  a  »o  ««!»»  •<«  *•*’  ““  9“  'lm*  ",iw  ”01J 

(X)«  ,  S(A  -  <*W«  W*  TO”*  b9,,,OP,1  b9“9qX*  9,11 


Mi  fc*^S  -3rii  (  ■  1  “°Ti  " 


.*  0.V  OS  -1-  -  ...  03  ,WC«.  too  ,1  (X|A)«  8nl«iu*»A 

r  »  -  ’ 
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expectation.  X  and  A  are  independent  of  Z  by  the  assumption  above. 

<"on 

Let 


(5.2.9) 


5(x1;x) 


r  1  ,  if  x.  =  x  , 
{ 

L  0  ,  if  xi  jL  x  , 


and 


(5-2.10) 


m  ( z  ;x) 
n~n?  ' 


Define 


n 


m  (V~ )  }  8(Vx)yi  ’  if  mn^n>'x)>0  • 


(5.2.11)  f  (z  ;x)  = 

'  T  n  ^n  ’  ' 


n  ~n 


i=l 


0 


,  if  m  (z  *x)  =  0  . 

n  ~n 


Now 


E  (A  |x)  =  E{E(Y|A)  |X  =  x]  a  .  s  . 
(5.2.12)  =  E{E(Y  |A,X)  |  X  =  x)  a  .  s  . 

=  E  (Y  |x)  a  .s  . 


since  E(Y|A)  =  A  from  (5.2.2),  and  X  and  Y  are  conditionally 
independent  by  assumption.  In  the  third  step  we  integrate  with  respect 
to  A  only. 

When  m  (z  ;x)  =  m  >  0  ,  if/  (z  ;x)  is  the  average  of  m 
n  <v»n  n  ^n 

independent  unbiased  estimates  of  E(Yjx)  =  E(A|x)  .  Therefore 


11  ,  1 


. 


.  0 


(«  .  x|  (A|Y)a]a  -  (*|A)a 

...  [*  «  X|  (X,A|Y)S}1-  'SI.S.?> 


„Ii*noHlbnoo  ...  X  bn.  X  bn.  ,(S.S-?)  A  *  *'  ' 

.  I _ Un  V 


39q3S,  **  2  •>'">1  nw  .  :  •*  "I  .nollq«u...  :  nbn  ■  >rl 
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(5.2.13) 

(Z  ;x)  1 
n  ~n  '  1 

m  =  m 
n 

i  • 

X 

o 

A 

=  x]  =  ^  M 

f 

(5.2.14) 

E [ip  (Z  ;x)  1 
rnV,n  '  1 

m  =  0 

n 

,  X  =  x] 

=  0  , 

(5-2.15) 

e[(V'„(z„;x) 

-  f  (x 

V 

))2|m  = 

m  >  0,  X  = 

x]  = 

i  Var  (Y|x)  , 

and 

(5.2.16) 

e[(^  (z  ;x) 

'rn  >un’  ' 

-  ip  (x 
r  M- 

))2|m  = 

1  n 

0,  X  =  x]  = 

ljj2{ 

x) 

j 

If  we  replace 

f  by 

in  (5.; 

2.7),  we 

see  that 

r(f  )  =  E[f  (Z  ;x)  -  f  (x)]2 
vrn  TnNron  (j. 


(5-2.it) 


=  E{E[(^n(Zn;X)  -  ^(x))2|X  =  x]) 

=  E{E[E((fn(Zn;X)  -  yx))2|VX)|X  =  x]) 


=  E{E[ 


n 

V  i  ¥ar  (Y  jx)  Prob  (mJZ^-X)  =  m) 
m=l 


(X)  Prob  (mn(zn;x)  =  0) |X  =  x] ) 


n 


=  E [Var  (Y|X)E[  ^  ^  Prob  (mn(Zn;X)  =  m) |X  =  x] 

m=l 


+  ^(X)E[Prob  (mn(Zn;X)  =  0)|X  =  x]} 


Also, 


-  eoi  - 


t  (x),v  [«-*•<><"  - 


.  '  0  f  {0  1 


1 
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(5.2.18) 


rtyj  <  E{Var  (Y  |x)  +  ^(x)) 

=  E{E(Y2|x)  -  E2(Y|x)  +E2(A|x)} 
=  E{E(y2  |x)  }  <  °° 


because  of  (5.2.2). 


Now  for  any  Z  such  that  m  (Z  lx)  -  s  , 
y  ^n  nv  n'  1  1 


(5.2.19) 


Prob  (mn(Zn;x)  =  s)  =  (")  PS(x)  (l-P(x)) 


n-s 


where 


(5-2.20) 


PA(x)  d  G(X) 


is  the  probability  that  there  are  exactly  s  successes  in 
trials  with  probability  P(x)  for  success.  Then 

(5.2.21)  E [ Prob  (m  (Z  ,*X)  =  0)  |X  =  x]  =  (l  -  P(x))n  , 
and  hence 

(5.2.22)  lim  E [ Prob  (m  (Z  ;x)  =  0  |X  =  x]  =0  a.s. 

n  — > 00 

Also, 

n 

(5.2.23)  E[  V  i  Prob  (mn(Zn;X)  =  m|X  =  x] 

m=l 

n 

-  (")  Pm(x)(l  -  P(x))n'm 


independent 


Ill 


< 


n 


m=l 


^  0  Pm(x)(l  -  P(x)) 


n-m 


< 


-  (n+l)p(x) 


n 


m=J 


O  pm+1(x)d  -  p(x) ) 


n-m 


< 


-  (n+X)P(x)  • 


Then 


n 


(5.2.24)  lim  E[  )  —  Prob  (m  (Z  :X)  =  m)  X  =  x] 

/  m  v  n  Von*  1  '  1  J 

n  ->  co  L-i 

m=l 


=  0  a  .  s  .  , 


and  by  (5.2.I7),  (5.2.18),  (5.2.22),  (5.2.24),  and  the  Lebesgue  dominated 
convergence  theorem,  we  see  that 


(5.2.25) 


lim  x(ip  )  =  0 
n 

n  00 


Thus  the  risk  of  ip  (Z  ;x)  attains  the  risk  of  the  Bayes  estimator  as 

n 

n  — >  00  f  and  hence  ip  (Z  ,*x)  is  an  asymptotically  optimal  estimator. 

n  <^n 


5.3.  Estimation;  Continuous  Case. 


Johns  [6]  also  considers  the  case  where  the  observed  X's 
possess  absolutely  continuous  distribution  functions.  Let 
#2  -  fFuM  ;  w  e  fl]  be  the  class  of  all  absolutely  continuous 
conditional  distribution  functions  where  £2  =  [co]  is  an  abstract  index¬ 
ing  set.  If  (£2,  CL  is  an  a  priori  probability  measure  space  where 


bs3l.„Jm,b  add  bo*  ,(4S.8.?)  .(SS.Ib  ,(81.8.0  ,<TI.S.O  X«f  bn* 
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a 

f>) 


is  a  (j-algebra  of  subsets  of 
defined  on  the  product  space 


ft  ,  then  there  exists  a  function 
(reals)  X  ft  such  that 


(5.3.1) 


f  (u)du 


> 


for  each  weft.  Assume  that  (ft,  OU  )  is  such  that  the  function 

f  (u)  is  a  measurable  function  on  the  product  space  (reals)  X  ft 

As  before  Y  =  Y(w)  is  the  ft-valued  random  variable  which  is  the  identity 

mapping  of  ft  onto  itself.  Let  X  =  (X  ,X  ,  ...,X  )  where  X.  , 

j  =  l,2,...,r  are  random  variables  which  are  conditionally  independent 

and  identically  distributed  according  to  F^(x)  given  that  Y  =  co  . 

Define  the  random  variable  A  for  a  given  measurable  function 

h(x)  by 


(5.3.2) 


A  =  A (Y)  =  E(h(x)  |Y)  , 


where  X  is  a  generic  representation  of  the  X^’s  anc*  assume 


(5.3.3) 


Ell  (X)  <  co 


As  in  the  discrete  case,  the  Bayes  estimator  of  A  using  X  where  the 
risk  is  the  expected  squared  error  is  given  by 


(5.3.4) 


*  (x)  =  E(A|X  =  x)  . 

H* 


As  before  we  have  the  random  vectors  of  prior  observations 
X.  =  (X  ,X  ,...,X  )  for  i  =  1,2, ... ,n  ,  the  X  *s  being 

1.  1  y  I  1-  J  y  *  1  ^ 


„oiJ9«A  .  .)  to  .  •'>  »  •  narfi  ,  I!  io  >  »o  •*<«  I»-'9  R  el 


,  *(«>„*  ’  -  (x'e^ 
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independent  of  each  other  and  of  X  and  where  for  each  i  the  X.  . ’s 

1,  j 

are  conditionally  independent  and  identically  distributed  according  to 

Fw  (x)  given  that  where  Y^  ,  i  =  l,2,...,n  are  independent 

random  variables  and  independent  of  Y  such  that  each  has  the  same 

distribution  as  Y  .  If  X^  =  (X.  -,X.  X.  )  ,  i  =  1,2,  ...,n  , 

i.  i,  1  ijc  i  j  r 

then,  as  before 


(5.3.5) 


(r ) 


E'h(Xi,r+l)  l-i  =  *)  =  E(Al2i  =  £)  =  V^(x)  • 


The  X’s  can  be  made  discrete  in  the  following  way.  Consider 
the  double  sequence  of  half-open  intervals 


(5.3.6) 


r(«0 


tc 


\~  n 


1^6/ 


r 


I’6/' 


t—  0,  +1,  +  2,  .  • 


n 


1,2, . . 


where  c  >  0  and  0  <  5  <  1  .  For  each  n  partition  r -dimensional 
Euclidean  space  into  a  countable  sequence  of  noni-over lapping  hypercubes 

C^n/  where 

J 


(5.3.7) 


(n)  _  (n) 

j  . 

1,  J 


X  I 


(n) 
t2,  j 


.  X  I 


(n) 


tr,  j 


j  —  1,2,...  , 


where  the  t,  .'s  are  suitably  chosen  integers.  For  each  n  let 
C^D^(x)  be  the  unique  member  of  the  sequence  (5 *3 *7)  which  contains  the 
r-component  numerical  vector  x  =  (x^,x^,  . .  ,,x^ )  .  If  2S(q)  ’ 
q  =  1,2,...  ,  m(x)  ,  m(x)  >  1  ,  are  the  distinct  vectors  obtained 
by  permuting  the  components  of  x  ,  we  have,  analogous  to  (5.1.10)  and 

(5.1.11), 


,  .Y  aiarfw  w  =  y 


isbJ  anoO  .yaw  snlwollo*  arii  ai  a isioaib  >Bxir  s<J  n«o  a»X  arfl 


atdua-iaqyrf  gxilqqalievo-aon  *o  6>an  upse  aIdB3naoo  a  c3  1  a^aqs  naabHoi/3 


(5-3.8)  M 


(n),  .  r 
i  «  =  { 


(r ) 

1,  if  there  exists  a  q  ,  l<q<m(x)  such  that  X;  'e  C 

0, 


for  i  =  1,2, . . . ,n  ,  and 


n 


(5-3.9) 


M^n^(x)  =  (x) 

i=l 


If  we  define  the  empirical  Bayes  estimator  ^(x)  by 


n 


m(“) 


(5.3.10) 


f  (x)  = 

rn  — 


Mw(*)  ^ 


0 


M;n^ (x)h (x .  )  ,  M^(x)  >0  , 

1  — '  v  i,r+l  x—  ’ 


,  otherwise  , 


then  Johns  [6]  proves 


Theorem  7 .  Let  X  be  £  generic  representation  of  r  >  1  random 
variables  wh ich  are  independent  and  identically  distributed  according 
to  e  cfi 2  ’  anc*  let  there  be  £  measurable  function  h(x)  such  that 

(5.3.2)  holds .  Suppose  (ft,  d  ,  (j.)  ££  such  that  there  exists  a  measur¬ 
able  function  f  such  that  (5.3*1)  holds .  Let  X  b£  the  vector  of 
the  present  r  observations ,  and  let  X^  ,  i  =  l,2,...,n  be  the  prior 
observations  where  the  X. ' s  are  mutually  independent  and  independent 
of  X  and  where  each  X^  consists  of  r+1  independent  random  variables 

which  are  identically  distributed  according  jto  F^  €  2  '  — 

i 

(ft,  CL  ,\i)  is  such  that  (5.3*3)  holds ,  then,  using  loss  function  (5-1 -3)^ 


bn*  ,  n . SJ  -  1  *ol 


(x  J  loiRici^aa  5  aoliiqm  ♦riJ 


O  <  (x)(  ]H  ,  (j  x)ri(X  [m  7 


i  .030  b  k  b  I  n  i'| r  I  n  bvafoni  £*b  d*l[r{  *?S**±m 


-  115  - 


where  R(^)  is.  the  risk  pf  the  Bayes  estimator  and  R(^)  is_  the 

risk  of  the  empirical  Bayes  estimator  defined  in  (5*5.10) • 


5.4.  Application  to  Hypothesis  Testing, 


The  empirical  Bayes  estimation  procedures  just  described  can 
be  applied  to  two-decision  problems  of  the  hypothesis  testing  type. 
Suppose  we  wish  to  test  a  hypothesis  of  the  type  (2.5.12),*  i.e., 

l  A  <  A  ,  with  loss  function  (2.5.1l)j  i.e. 


L(d0^) 


L(d1,X) 


r  0  ,  if  A  <  A*  , 

"4  (A-A  )  ,  if  A  >  A  , 

,  (A* -A)  ,  if  A  <  A*  , 

l  0  ,  if  A  >  A  , 


where  A  is  a  fixed  constant. 


If  ^/(x)  is  the  function  which  assigns  a  decision  ^(x)  =  d^ 
or  d^  to  each  possible  value  x  =  . . „,xr )  of  the  random  variable 

X  =  (X1,X2,...,Xr)  ,  then  let. 

r  0  ,  when  ^(x)  =  d 

(5.4.1)  5(x)  =  4 

'  4  1  f  when  ^/(x)  =  d^ 


Then 


’ 


1  ’!  I  •'  ■■  'j  ■  .)T  •  »!•:' 


«•  :  ■  •  v  ■ 


,  *  <  a  i  {'  }  y 


f.  (-  .0»).i 


% 
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L(^/(x),A)  =  5(x)  L(d1,A)  +  (1-5 (x) )  L(dQ,A) 
(5-^.2)  =  L(dQ,A)  -  5(x)[L(dQ,A)  -  L(dx,A)] 

=  L(dQ,A)  -  6(x)  [A  -  A*] 


The  risk  is  then 


R(VO  =  EL(f(X),A) 

=  EL(dQ,A)  -  E[6(x)(A  -  A*)] 

(5.4.3)  * 

=  EL(dQ,A)  -  E{5(x)  E[(A  -  A  )  |X]  } 

=  EL(dQ,A)  -  E{6(X)[E(A|X)  -  A*])  . 

The  function  ip  (x)  which  minimizes  R(^/)  is  the  Ip  (x)  corresponding 

M-  M- 

to 


(5.4.4) 


J  0  ,  if  ^(x)  =  E  (A  |X)  <  A*  . 
I  1  ,  if  f  (x)  =  E  (A  I X)  >  A*  . 

H* 


If  we  consider  the  supplementary  sample  approach,  then  r  =  1 
above,  and  the  Bayes  risk  is  then 


(5.4.5) 


R(f  )  <  EL(d  A) 

H*  v-/ 

<  E  |A  -  A* 


<  00 


due  to  (5.2.1).  The  regret  in  using  arbitrary  ^/(x)  with  corresponding 
5(x)  is 


/;  b)Jt  ((;  )s~i)  +  (/,j  >)  '  >  r.H) : 


[I  -  A  (X)  J3  -  (  •  " 


ft 
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(5.4.6)  r(f)  =  Rty)  -  Rty  ) 

H* 

=  E{(5  (X)  -  S(X))ty  (X)  -  X*)}  . 

.[A  r 

Using  our  empirical  Bayes  estimate  ^(z^j’x)  defined  in  (5.2.11)  , 
we  define  a  supplementary  sample  non-parametric  empirical  Bayes  test  of 


the  hypothesis 

as  the  lb  (z  ix) 
rnv~n7  7 

such  that 

/  \  _  *X" 

f  0  ' 

if 

V  (z  ;x)  <  x 

(5.4.7) 

5  (z  ;x)  = 
n  ~n  ]  , 

^  1  > 

if 

n  ~n  — 

ib  ( z,*x)>A 
rnv~n7  7 

The  regret  becomes 


(5.4.8) 


rtyn)  =  e((6^(x)  -  6n(zn;x))(^(x)  -  X  ))  . 


Now 


(5.4.9) 


5  (x)  -  6  (z  ,*x)  =  l  only  when  ip  (x)  -  X  >  0  , 
p.  n  ~n'  r[i 


and 


(5.4.10) 


6  (x)  -  &  (z  !x)  =  -1  only  when  ip  (x)  -  X  <  0 
|j.v  7  n  ~n  (i  — 


Also, 


* 


5  (x)  -  S  (z  ,*x)  /  0  only  when  lb  (x)  -  A  >  0 
l_iv  7  nv~n7  r  r]~r  7 


and  lb  (z  ;x)  -  A  <  0  , 
rn  V,n7  7  —  7 


or  ij/  (x)  -  A  <  0 

n 


and  lb  ( z  !x)  -  X  >0 
rnV,n7 


(5-4.11) 


-  TU  - 


(  *)*•-  (>»)*  -  (Jf)i  [*.*■?) 

X' 

.  K*x  -  (x)  ^f)((x)a  -  (x^))a« 

)  r  •  .1  (x\  7l  •  '  •  :U°  Si  ' 

■  • 

.  'j  I 

•  '<2  X"n*V’  '  *  i.  .  (x;  ,)  3  (T.4.?) 

ssntoasd  ia-xsai  ariT 

N 

.  K*x  -  (x)_;<)((x;frs)oa  -  (xjMfl))*  -  („*)» 

t  0  <  *A  -  (x)  ^  nsrfw  yirro  i  =  (*!„*};, 3  ’  '  j/ 

.  0  >  -  (*)  f  °driw  X-fno  I-  *  (x'fl*}ne  "  (X  V3 

(  <  A  -  (x)  ^  n i.  fiv  y.I.  o  0  \  x;  i  ;  i:«  •  ( ^ 

r 

,  o  >  *  -  bflS 

o  >  A  -  (x)  ^ 


(8.4.?> 

woH 

(01.4.?) 

'OSiA 


(a. 4.?) 
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Then  (5.4,8),  (5.4.9),  (5.4.10),  and  (5.4.11)  imply  that 

r(fn)  <  E|^(X)  -  A*  | 

(5.4.12)  <E|(^(X)  -  A*)  -  (^n(Zn;x)  -  A*)  | 

<  E  -  VX)  !  • 


From  (5.2.7)  and  (5.2.25)  we  have  proved  that 


(5.4.13) 


lim 

n  — » 00 


E^n^n’X)  - 


=  0 


which  implies  that 


(5.4.14)  lim  r(^n)  =  0  * 

n  ->  00 

Thus  lim  R (f  )  =  R (ip  )  ,  and  hence  ijr  (z  ',x)  defined  by  (5.2.11) 
n  -> 00  n  ^  n 

such  that  (5.4.7)  holds  is  an  asymptotically  optimal  test  of  the  hypothesis. 


In  a  similar  manner  Johns  [6]  showed  originally  that  if 


(5.4.15) 


5  (x) 


if 

if 


y  \  _ 

V'  M  <  A 

n  —  — 

>  A* 


i 


t 


where  ljj  (x)  is  defined  by  (5. 1.12),  then  lim  R(8  )  =  R(5  )  when— 
n  n  -» 00  ^ 

p 

ever  the  a  priori  probability  measure  is  such  that  ^n(x) — *  E(A|X  =  x) 
for  all  x  in  some  set  S  which  is  assigned  probability  one  under  the 
distribution  of  X  ,  and  E  |A|  <00  .  Thus  it  is  clear  how  to  obtain 
an  asymptotically  optimal  test  of  the  hypothesis  in  the  case  where  an 


UMo  yd  mU  io  )M9  X  iq  aq«  «*  <*  f  >Iod  T.‘ ■?:  *•*' 
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unbiased  estimate  of  A  in  the  distribution  of  X  is  known  at  the  ti 


me 


the  estimate  is  to  be  made. 


For  this  case  Johns  also  shows  how  to  obtain  a  test  of  a 


hypothesis  of  the  type  (2.3.13)?  i.e.  H  :  |A  -  A  |  <  A  ,  where 


* 


A  and  A  >  0  are  fixed,  and  with  loss  function  (2.3.14);  i.e. 


L(d0,X)  =  | 


(A-A*)2  -  A2  ,  if  | A -A*  |  >  A  , 


0 


,  if  | A-A  |  <  A  , 


L(dL,X)  =  | 


0  ,  if  |A-A  |  >  A  . 

A2  -  (A-A*)2  ,  if  |A-A*|  <  A  . 


If  6(x)  is  defined  by  (5.4.1),  we  see,  as  before,  that  the  risk  is 


Rty)  =  el(v/(x),a) 


(5.4.16) 


*,2 


=  EL(dQ,A)  -  E[6(X)((A-A  ) 


EL(d  ,A)  -  E(6(X)[E(A  |X) 


A2)] 

2A*E(A|X)  +  A*2  -  A2]} 


Therefore  the  function  ip  (x)  which  minimizes  R (ip)  is  the  ip  (x) 

M>  r 

corresponding  to 


(5.4.17) 


#  r  0  ,  if  ^  (x)  =  E  (A*~  [X  =  x)  -  2A  E(A|X  =  x)  <  A2  -  A 

6  (x)  =  4  ^  *  „  ,£ 

11  l  1  ,  if  f  (x)  =  E(A  |X  =  x)  -  2A  E(A|X,  =  x)  >  A  -  A 

r* 


Johns  then  shows  that  if  we  can  find  empirical  Bayes  estimates  (x) 

(2 ) 

and  ip '  ;(x)  based  on  the  prior  independent  observations  such  that 


e  ■  •  •  a  -  '  5 


B  lo  *aa:*  b  aiaado  oi  wod  awoda  oals  andoL  saao  aid*  iqI 

.  4  >  |*A  -  A|  :  .9.1  -.(tl.f.S)  9qx)  •*>  io  «l«»"<*nd 

.».!,  ;(*!!. f.s)  nollama  seol  rfllw  bn*  ,b«cl5  »M  0  <  A  bn»  * 

,  A  <  |  Vx|  11  ,^A  -  :(V/  _  (X< 

,  A>  |'<-X|  51  ,  0  ° 

.  A  £  |  *X-A|  11  ,  S(  Vx)  -  a  j  J  , 

al  ;4ali  =.dl  i»dd  .s-icisd  aa  ,9*8  aw  Xd  &««*'•*  al 

■ 
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(5.4.18) 


E(A|X  =  x)  , 


and 


(5.4.19) 


E(A2|X  =  x)  , 


for  all  x  e  S  ,  and  if  EA  <  oo  and  we  define 


(5.4.20) 


*,  f  0  >  lf  fn(z)  =  ^2'(x)  -  2aV1)(x)  <  A2  -  A*  , 

n  ll  ,  if  f  (*)  =  ^2;(x)  -  2A  *A1J(x)  >  a2  -  A  , 


then 


lim  R{5  ) - >  R(5  ) 

'  n'  x  u. 

n  — >  oo  ^ 


An  empirical  Bayes  estimate  ^  (x)  for  E(A|X  =  x)  is  given 

2 

by  (5.1.12).  To  get  an  empirical  Bayes  estimate  of  E(A  |X  =  x)  ,  we 
assume  that  Eh^(x)  <  00  and  that  the  number  of  components  in  the  vector 
of  prior  observations  X^  for  =  l,2,...,n  exceeds  the  number  in 
X  by  at  least  two.  It  can  then  be  shown  that 


(5.4.21) 


E(h(Xl,r+l)  h(Xl,r+2)  I  =  2)  =  E(A2|X  =  x) 


and  that 


(5.4.22) 


n 


.(2),  x  )  An'-^  i=l 

tfr  mx  = 

n  — 


0 


zr" —  Y  Min^2s)  h(xi  r  .  1 )  h(xi  ,  .?)>  Mn(5)  >  0 

M  (x)  .U  1  1’r+1  1’r+2  n 


,  otherwise 


•  ' 

b  s 

i  t  *  x|V  ;i  - 

-  • ;  a  .j 

•  •  • 

■ 

v~y  -  • 

msvtg  ul  (x  -  (x)  •  Vf  •3aml3ae  asv&H  lap! iJtqma  aA 

sv  r  X:  A  ic  ,itU'3  ffiD-iiiqrrd  ~ tdj.  o.  .(^JLJ.c1)  yd 

.%]  i  <•  "  ro.i  ;  i  .f oi  d.'"i  3  a  •’  I  •'  3  3  -v 

isdouo  »ri3  abosoxB  at...*£,X  p.noiiBVTsado  ToJtiq  Xo 


» 
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is  such  that  (5*4.19)  holds.  Therefore  it  is  clear  how  to  obtain  an 
asymptotically  optimal  test  of  the  hypothesis  (2.3*13)  with  loss  function 
(2.3*14). 


5*5.  Non- parametric  Empirical  Bayes  Approach  For  Selecting  the  Best 
of  k  Populations . 


The  non-parametric  empirical  Bayes  approach  can  be  applied  to 

the  problem  of  selecting  the  best  of  k  populations  by  dropping  the 

previous  assumptions  of  Chapter  four  that  £-.  and  G.  are  members 

i 

of  specific  parametric  families  with  unknown  parameters.  It  will  be 

shown  that  only  mild  assumptions  on  the  first  two  moments  of  the 

conditional  distribution  of  X.  given  A.  are  necessary  to  derive 

ii 

empirical  Bayes  procedures  which  select  the  best  of  k  populations  when 
(l)  f^  is  discrete  J  (2)  f^  is  continuous. 

Define  the  class  to  be 

P 


(5-5.1)  $  =  [G  =  I]  Gi  *  f  l\|PdGi(\)  <  °°  for  a11  1  =  1>2>***>k} 

’  i=l  £2 


Suppose  we  have  one  observation  per  population;  i.e.  r  =  1  in  chapter 

four.  Let  G  be  in  for  some  p  >  2  ,  and  let  the  loss  function 

P  “ 

be  given  by  (4.2.2)  .  It  follows  from  the  results  of  section  4.6  that 

if  £  ^(x  )  is  a  function  of  the  prior  observations  x_  .  ,x  x 

^h,i  i  1,1  ^,1  n,i 

from  the  i’th  population  such  that 


n 


i(xi} 


E  (A.  lx. ) 
v  1  1 


(5.5.2) 


. 

aollooui  aeoi  HiJt w  ({!.£. S)  alas^oqxri  »d9  »0  3  U  -  i- 

.(41.;. 2) 


3898  9fj3  pni339j93  io8  rfasoTqqA  »9X»8  iBolilgari!  olM«08«st"oH 

• aaoXiait/gog  > 

oi  b9llqqa  ad  nao  riaaciqqa  a9^BS  laoiilqin':  all3  ama^raq  -  non  erfr 

srii  sniqqoib  yd  anciiBitnoq  jI  io  *aad  eri*  gniiaalaa  io  m&J  orq  a" 
8  a  adman:  aia  0  baa  ,1  3ad*  1W©»  i»3q»H3  io  anoXiqflruaaa  atiolvaiq 
9C  .8  Ja-ror:  I  nwonjiau  ffclw  aailimai  oiiSt-*1  aq  oii  o  a  -o 

. 

S'  oj  r»U93M  «6  ./  nsvlg  X  io  nolsr  X  t  J  i  I  :ol  J  r  'ilf :  0 

ndrfw  anolialuqoq  3*  io  aaad  arid  Soalaa  riolriw  aaiubaaoiq  ‘  laoi^is 

.awoufliinoo  A  (2)>:  ;  si^oelb  al  ^i  (I) 

ad  o3  v  -  .  BR£lo  aria  an JV>a 


,.,..9,1  -  1  H»  Ml  -  >  (/V^l/I  \  :  1°  |±  “  05  =  q^ 


osria  ni  i  ,  I8li:-:a<  3  r:  33,-  TBeio  »no  ' 

. 

1,1  d.4  no  lo  BSlues*  »rf3  MOTl  a»oIIol  31  .  2.  .4)  xd  18 

K%  y  fine  f  r  ardo  ioJt*q  a  Hi  io  noi  1:  u:'i  a  •■  1  ,a^ 

3urb  done  nol^Bluqoq  ri3  i 
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for  each  i  =  1,2, 
ing  the  best  of  k 
by 


.  ,k  ,  then  the  empirical  Bayes 
populations  with  respect  to  any 


procedure 
G  e 


P 


for 

is 


select- 

given 


(5*5*3)  5  (x)  =  d.  where  j  is  any  integer  1,2, ...,k  such  that 

^  3 


£  .  (x.)  =  max  {?  .(x.))  . 

n’J  J  1  <  i  <  k  n'1  1 


For  each  population,  a  function  of  the  prior  observations 
from  that  population  which  converges  in  probability  to  the  a  posteriori 


mean  can  be  found  when  f,  is  discrete  or  continuous.  Assume  for  both 

A 

cases  that 


(5-5.4)  E(X.|Ai)=Al  , 

and 

(5.5.5)  E(Xi lAt)  =  C1  +  c2  Ai 


for  constants  c^,  c^,  and  p  >  2  .  The  subscript  "i"  which  indicated 
the  i'th  population  will  now  be  dropped  for  convenience  since  it  is 
desired  to  find  a  consistent  estimate  for  the  a  posteriori  mean  for  a 
typical  population. 


Assume  that  each  of  the  prior  observations  is  a  vector  of  two 

independent  observations  while  A  remains  a  constant.  Thus  our  prior 

independent  observations  are  X.  =  (X.  ,,X.  )  with  unknown  parameter 

J  J  » J  y t- 

A.  for  j  =  l,2,...,n  ,  With  appropriate  notational  changes,  the 


■ 

f  ,  a  X  \  ■■  r)  m*  03  ioaqa*  -tl r  err -  ;  U  r  loq  ^  5  .  '  ’ 

Xd 


3t  r  >!  .  T-  '  '■  ’  j  *  *  1 

.  ■  })  M  *  <tX>t 


s, .,  ,  ,  i  jIt  •  »■  »■  «  '  ”  ■ 

- 

.  •  ' 

i  1:  8  3 

’ — 1  \ 

|X  .a  +-»  =‘(^1^)3  .  (?.?•?' 

' 

a  .  >1  3i  •.  >o«  »  8fi3  V  *J» -•}-'*»  9r  '03  a  bali  ol  l»:  -u-b 

cu  i  io  xc3  39v  A  ai  «aoii*v^»8do  Ifii  tq  *c  rloas  tmuEBA 

f;  ,'J  •.  i  ’  .  O  ■ 
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empirical  Bayes  procedure  given  by  (5.5*3)  is  still  valid.  Using  (5.5«l)> 
(5.5.*0,  and  (5.5.5)  we  have 


(5.5.6) 


E(Xj,2lXj,l)  =  E£B(Xj>2|Aj,Xj>i)|Xj  1) 
=  E(E(X.;2|A.)|Xj;1} 

=  E(AjlXJ,l) 


Since  X.  .  and  E(A. |X.  ,  )  have  the  same  joint  distribution  as  X 

j.i  j  1  j,i 

and  E(A|X)  ,  then  for  any  observation  x  of  X  , 


(5.5.7) 


E(Aj|X  =  x)  =  E  (A  [X  =  x)  . 


Consider  the  cases  where 


is  discrete  or  continuous. 


Discrete  case .  In  this  case  f^(x)  >  0  for  every  x  e  S  ,  a  countable 
set,  and  for  every  A  e  Cl  ( 5  -  5  *  T )  suggests  the  following  estimation 
procedure.  Define  for  given  x  e  S  and  each  j  =  1,2, ...,n  , 


(5.5.8) 


f  V  ’  “  V  =  X  ’ 

v.  0  ,  otherwise  , 


and 


(5.5.9) 


if 

otherwise 


X.  =  x 


Then 


,  J  ..)  8ni. u  U.v  nil.  .1  (*-?.?)  *  «*••*»  ”ob,1M!T’  83yS!  Uol,iq'” 


(a.?.?) 


Vt*"  ■ 


X  kb 


rioi Jfjdxiialb  itnlot  a«fia  eri^  9V8rl  bnB  j  ,  ' 

;0  X  flOlJBV*!*sdO  Vi  B  TO*  n«i  •*  1  (.-<JA)3  bDi 


.  (x  =  x|A)a  -  (x  -  :  tA): 


-  • 


9591081b  8l  (»),1  MS  1W  89890  adl  Ifl'  800.' 


} *Mioaia 


9ld8,nooo  .  ,  2  »  X  tw  »*  0  <  Wx1'  »8*9  BiHS  01  - - 

,  5,,  5.9  vnl.O IU  '  11  8«  (T,3.0  »  *  *  «»  b»* 


„  sut  *>»  bn.  8  i  X  navis  to*  anllafl  .aiubaooiq 

(  ilj  •  •  • 


C 


.  s.jl  *'ii-  i  to 


*  *.t‘  l 
0 


18.5-0 


l  L 


.  s;  LWf9.^o  i 


ri^xiT 
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(5.5.XO) 


EY 


j  -E(Xj,2|XJ,l“  50  ‘  fGW 


=  E(A  |X  )  •  f  (x)  < 

J  J  } 


00 


from  (5.5*6)  and  where 


fG(x)  =  p(X  =  x)  =  J  f^(x)  d  CM  . 


Also,  using  (5*5*5)  and  the  observed  value  y  of  X.  ,  we  have 

J  f e- 


EY2  -  -2 


.  <  EX 
J  ”  J,2 


£  y2fG(y) 
s 

y2  f  fA(y)  d  g(A) 

Jq, 

-  fa  ( I  y2fA(y)) d  G(x) 

S 

<  jT  (c^  *  c2  Ap)  d  G(A)  < 


00 


due  to  (5.5.I)  •  Thus  X;l>£ 2’ '  * '  are  in<*ePen<*ent  and  identically 
distributed  random  variables  with  finite  variance.  By  (5*5*7)  and  the 
strong  law  of  large  numbers, 


n 


Y  =  — 


a .  s 


Y  ^E(A|X.x) 


fGW  • 


j=l 


(5.5.11) 


<XV  •  <**  1,1*1  S.I**-  t,a 


.  >  •  .  C 


. 


3  ■  ■ 

■ 
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Now 


n 


j=l 


Y.  >  0  implies  that  \  K.  >  0 
J  / _ j  J 


and  we  define 


n 


(5.5.12) 


n 


z  = 

n 


n 


j  =  - 


K. 

J 


*  lf  Z_j  >  0 
J-l 


o 


,  otherwise 


Since 


(5.5.13) 


n 

i  E*. 

j=l 


K 


a.s  . 


»  fr(x)  >  0  , 


we  have  from  (5.5 .11)  that 


(5.5.14) 


z  ...  e(a|x  =  x  )  . 

n  \  1 


Thus  from  (5.5*3)  we  see  that  the  empirical  Bayes  procedure  is  given  by 


(5.5.15) 


&n(x)  =  d^  where  j  is  any  integer  1,2, ...,k  such 


that  Z 


=  max  {Z  )  , 
,J  1  <  i  <  k  ’ 


.  is  computed  as  Z  in  (5. 5. 12)  for  the 
n,  i  n  v  ' 


where  for  each  i  ,  Z 


■ 


k 


> 
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i’th  population  as  the  typical  population. 

Continuous  case .  Assume  that  f^(x)  *-s  a  continuous  function  on  a  set 

S  for  any  A  €  ft  such  that  f^(x)  >  0  for  anY  x  €  S  and 


f  f^(x)  dpi(x)  =  1  . 


Assume  also  that  there  exists  a  real  number  K  such  that  f^(x)  £  K 

for  all  x  €  S  and  A  €  ft  )  i.e.  f^  is  bounded  uniformly.  For  each 

n  partition  the  real  axis  into  non-overlapping  intervals  [“wo  >  ^T/6 )  > 

n  '  n  < 

where  q  =  0,  +  1,  +  2,  ...  .  For  the  present  observation  x  e  S  ,  let 

/  v 

1^  '(x)  denote  the  unique  subinterval  which  contains  x  for  each 
n  =  1,2,...  ,  and  define 


(5-5.16) 


Y  .  =  X  . 

n,J  -n,J 


n1/2  X  ,  If  X  e  I(n)(x) 
J  >  J  f  L 


0 


otherwise 


Now  if 


then 


and 


fG(z,y)  =  P(Xj?i  =  y>  Xj f 2  =  z)  =  f  V2)  fA^y^  d  ' 


fG(z)  =  y  fG(z,y)  dp.(y )  , 


(5-5.17) 


EY  .  =  f  nV2Z 


n>  J 


I(n)(x) 


fG(z,y)d|u(y)  ^  dp. (z ) 


=  b 


n 


.oolJftiuqoq  iftolqx*  ***  **  noi3*Iuqoq  *3*1 

' 


,  .  a  ,  M-.  ifl  auotmlir™  «  a  (*  1  ■  -f  ' 

bos  3  »  x  ,-u  TOI  0  <  (*)i  t  3* '13  I'W*  4i  5  f'  xn»  3“ ■  -* 

.  I  .  (x)„b  (x)xl  l 


riass  io1!  .ylnnollou  bbbnuod  a 1  •  »•*,  '<  :  *  b  ' ’  ’’  *  lAh 


>  a 

-  v  ■  . 


JI  •  ld  « 


:  -alvrx&ri^o 


(s  A  *  (2 


X)S  .  v,s)  1 
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Then  by  Fubini’s  theorem. 


(5.5.18) 


lim  b 

n  ->  00 


"  =  n1^  /.{/.  2f^Z)dti(Z)}l{nV2/(n)/  W*™} 


I(n)(x) 


and  from  (5. 5*4). 


/.■*  x(z)dn(z)  =  E(X^2|A)  =  A  . 


Since  is  continuous,  we  have  by  the  first  mean  value  theorem  and 

(n) 

by  the  uniform  boundedness  of  f,  that  there  exists  a  £  el'  '(x)  such 

A  n  v 


that 


n1^2  f  f^(y)dn(y)  =  f^(Sn)  1  K  > 


and 


lim  n1/2  r  fA(y)d|j.(y)  =  lim  f^(£n)  =  f^(x) 

n  -»  00  J  (n)  /  \  n  ->  00 

I  '(x) 


Therefore  from  (5. 5.18), 


(5.5.19)  lim  bn  =  f  Afx(x)dG(A)  =  E(A|X  =  x)  t  fG(x)  <00  , 

n  — *  00  J  £2 


since  f>(x)  £  K  and  G  e 


j  -  1,2,,.., 


dG(A)  , 


Define  for  each  n  and 


n  the  random  variable 


A  .  (A|s  L*)a  -  (*)*<b(s)A>«  \ 


'  . -  — 


a  ASN 


(8 1.5.5)  uoi*  dioisierfT 


i  9  bn.  %  >  (x),4  9Sol» 


»Id«iT.v  mobn.i  .rid  -  t  «><>•“  rf3ie  ,oi  wi'*X 
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(5-5.20) 


Z  .  =  Y  .  -  E  Y  .  =  Y  ,-b 
n,j  n,j  n,j  n 


and  the  random  sum 


(5.5.21) 


Then 


(5.5.22) 


2  2  2 
EZ  .  =  E  Y  .-2b  EY  .  +  b 
n,j  n,j  n  n,j  n 


and 


2  2 

E  -  b  , 

n,  J  n 


2  2 

E  Y  .  <  E  X 
n,j  -  n,2 


(5-5.23) 


=  E(E[^2|An]) 

“  Efcl  +  C2  Ani 


=  M  <  00 


for  every  n  =  1,2,...,  since  the  ^’s  are  independent  and  identically 

distributed  for  j  =  l,2,...,n  ,  G  e  j^J  t  and  by  (5«5»5) *  From  (5.5.I9), 
(5.5.23),  and  the  basic  lemma  on  p.  277  °f  [9l  which  says  that  if 


n 


where  the  summands  are  independent  random  variables,  and 


1  t.  t.a  t< 


Ka  iic]  .1 8  1  b 


Ql.5.5)  .(*.?•■  )  *d  ^  *  q  *  '  1 


to,  ,.*Id.l»V  «oto„  tosbns^bni  n.  •«*.—»•  •*  •«*» 
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n 


-77  >  E  Z 


1 

2 

n 


j=l 


n,  J 


0  ,  then  oC.  f  — ^,n 


c£(0)  , 


we  have 


(5.5. 2h) 


n,  n 
n 


:  L 

j  =  l 


Y  .  -  b  - »  0  . 


n 


This  implies  from  (5.5.I9)  that 


(5.5.25) 


n 

Y  =  -  V  Y  E(A|X  =  x)  •  f  (x) 

n  n  /  n,  j  v  1  '  Gv  ' 


j=l 


Since  f  (x)  >  0  for  every  x  e  S  ,  its  consistent  estimator  f.  (x) 


n 


given  by  (4.6.10)  with  r  =  1  is  also  strictly  positive.  Thus 


(5.5.26) 


rfjy  -*•  E(A|x  -  x) 


and  from  (5.5*3)  the  empirical  Bayes  procedure  is  given  by 


(5.5.27)  B^fx)  =  dj  wliere  J  is  anY  integer  l,2,...,k  such 


that 


n, 


f  .(x.) 
n,Jv  J 


max 

1  <  i  <  k 


n, 


,  (x.  ) 


n,  1  1 


where  for  each  i  ,  Y  .  is  computed  as  Y 

,  n,i  n 

i’th  population  as  the  typical  population  and 

(4.6.10). 


in  (5.5.25)  for  the 

f  . (x . )  is  given  by 
n,  1  1 


If  the  present  observation  x  is  also  a  vector  with  two 


,  1 1 


' 


.  >)r/  •  (x  3  '  "j  ■  a 


(  I  ioSBmJt3*»  insJtlMoa  »5i  ,2r-x  yt.v.  Jo?  0  < 


) 


i  :,vlg  #Tufc  »ooiq  a*  <&'  ,(  •'  01* 


yd  „»V*8  Bi  (.«).  1  bn.  nold.Iuqo,  I.alqyd  •*  «»  noid.Inqoq  to  l 


,  ,J  It  iw  103SOT  »  O.X.  .1  *  »n  •  •  J!,8B»iq 
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observations,  then  an  average  of  the  two  estimators  obtained  by  first 
suppressing  one  and  then  the  other  component  of  x  could  be  used  with 
the  convergence  in  probability  required  for  empirical  Bayes  estimators 
still  being  obtained.  A  generalization  is  possible  to  the  case  in  which 
the  present  observation  is  a  vector  of  dimension  q  ,  and  the  prior 
observations  have  dimension  q+1  . 


Xd  b«ai*Jd0  «*-*>»  ov>  ut.  So 

.  .  :  .  -  '  ad  iQ  * 


.ox-  ,oi  «*-**•»«  “*  -a9*’9V°03  9,13 

rioiriw  ni  iiu  aril  o»  sldlaeoq  al  nolJ**llM»P»8  A  •bs“;,J<c 


.  .  VK  tr1w  s  el  noJtiBVisado  5«  aas  q 

ToJciq  srb  bns  ,  p  noisnsmlb  5o  losasv  * 


.  I+p  nolananrlb  svari  •nol3*vfi»«(2o 
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CHAPTER  VI 


COMPOUND  DECISION  PROBLEM 


The  compound  decision  problem,  which  is  closely  related  to  the 
empirical  Bayes  problem,  occurs  when  one  is  confronted  with  n  individual 
decisions  about  some  unknown  parameters  .  The  parameters 

are  considered  to  be  an  unknown  sequence  of  constants  and  not  as  the 
realizations  of  n  independent  random  variables  with  distribution 

function  G  „  Information  concerning  the  frequency  distribution  of  the 
parameters  is  obtained  from  the  sequence  of  observations  xi>x2,#*’  , 
and  the  aim  here  is  to  approximate  to  the  use  of  the  decision  function 
which  would  be  optimal  if  the  frequency  distribution  of  the  parameters 
were  known  in  advance . 

Early  work  on  this  problem  dealt  with  the  case  in  which  the 
component  decisions  are  of  the  simple  versus  simple  hypothesis  testing 
type  and  can  therefore  be  stated  in  terms  of  testing  whether  A  =  0  or 
A  =  1  .  If  decision  functions  are  allowed  to  depend  on  the  data  from 
all  n  components,  we  have  the  nonsequential  case  which  was  considered 
by  Hannan  and  Robbins  [4].  Samuel  [19],  [21]  considers  the  sequential 
case  where  it  is  assumed  that  at  the  time  a  decision  is  made  in  any 
particular  component  problem  the  available  information  includes  the 
data  obtained  in  all  previous  component  decision  problems  in  the  sequence. 

Let  X  be  a  random  variable  which  is  known  to  have  one  of  two 
distinct  distributions  F, (x)  for  A  =  0  or  1  . 


On  the  basis  of  a 


XV  .  .i  ."XAJO 


MaiaojH  laoiaioaa  avwo 


,rf3  od  bad.  1st  ylaeo'a  al  rialriv  .oialdotq  nolslaeb  muoqowo  ariT 

ar.blvibnl  n  ridi*  badnodlfloa  t  MO  «•*»  «<’"  :<lo”>  9V®  i**' 

add  SB  don  bn  a  adnslsaoa  lo  eooaupe*  avreoJc,  nt.  ad  o.  b  i;b'.  oa  nr. 
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single  observation  of  X  it  is  required  to  decide  whether  the  true 
value  of  the  unknown  parameter  is  0  or  1  . 

Consider  now  statistical  decision  problems  of  the  same  formal 

structure  which  are  in  a  large  group.  That  is,  we  have  a  sequence  of 

independent  random  variables  X^X^,...  and  parameter  values 

where  each  A.  is  0  or  1  ,  and  X.  has  the  distribution  function 

Fa  (x^)  which  is  given  in  terms  of  known  densities  f^  (x^)  with  respect 
i  i 

to  some  measure  p.  .  The  sequence  of  A’s  is  unknown,  and  we  are  required 

to  decide,  for  each  i  =  1,2,...  whether  A^  =  0  or  1  . 

In  the  sequential  case  the  decision  about  A^  may  depend 
on  the  observed  values  x^  =  (x^, x^,  .  .  .  ,x^)  of  X^  =  (X^X^,  .  .  .  ,X^ )  . 

The  possible  actions  for  any  decision  are  d^,  where  the  experimenter 
says  "  A  =  0  ",  and  d^  ,  where  the  experimenter  says  "  A  =  1  "  . 

Our  loss  structure  is  defined  by 

(6.1)  L(d  ,  A  =  0)  =  0 

L(d2,  A  =  0)  =  b 

where  a  >  0  and  b  >  0  . 

Let  5  be  a  decision  function  for  the  component  problem. 

Then  5  is  a  measurable  function  with  0  <  5(x)  <  1  ,  where,  when 
X  =  x  is  observed,  one  says  "  A  =  1  "  with  probability  5(x)  and 
"  A  =  0  "  with  probability  1  -  5(x)  .  Then  the  risk  of  6  is  given 


L(dx,  A  =  1)  =  a 
L(d2,  A  =  1)  =  0 


by 


»ri,  Mrtlorfv  >bio»b  ol  b*ti»p»  .1  11  X  io  oolJ»vi».do  .Isal. 


<  aa„n  7  ■  9J  61<5  bn&  _ J  «  I di» Jt  16  '  «obfl»3  J r .91  :  - 


boiiopoT  .»  «r  bnB  ,nwon)/no  »1  -'A  So  ^.up..  »d  .  M  !  “  °S 


TO  0  -  /  »■  ril»<tw  -  1 


■ 


' 


»  .  (I  .  X  ,  >)J 


(6.2) 


R(5,A)  = 


r  b  E^(5(X)  )  ,  for  A  =  0  , 

4  a  E^(l  "  5(x))  ,  for  A  =  1  , 

where  denotes  expectation  with  respect  to  F^  for  A  =  0,1  . 

If  A  ,  is  the  realization  of  a  random  variable  A  with  the 

l 

"a  priori  distribution"  P(A  =  1 )  =  rj  —  1  -,P(A  =  0)  ,  then  the  over¬ 
all  expected  loss  will  be 


(6.3) 


R(B,ti)  =  T]  R(8,l)  +  ( 1 _ tj)  R(8,0) 

=  r\  a  j^(l  -  5M)  fx(x)  d|l(x) 

-I-  (1-tj)  b  J ' 5(x)  f 0 (x )  d|j,(x) 

=  I*  [  ( 1  ”T|)  bfQ(x)  "  Tlaf1(x)]  5(x)  <M(X) 

+  J  Tjaf t  (x)  d(i(x) 

=  Tja  +  J  [(1-tj)  bfQ(x)  -  rjaf^x)]  6(x)  dp.(x)  . 


For  fixed  tj 
functions  5 

(6.4)  6 

T] 


(6. 3)  is  minimized  with  respect  to  all  possible  decision 

by  any  5  of  the  form 
T) 

f  1  ,  if  (l-n)bf0(x)  <  4af1(x) 

=  <  0  ,  if  (l-n)bf0(x)  >  r]af1(x) 

arbitrary  in  [0,1],  if  (l-T])bfQ(x)  =  r]af^(x) 


N. 
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Denote  by 


6  the  non-randomized  function 


(6.5) 


if  (l-ri)bf0(x)  <  rjaf^x) 
otherwise . 


The  rules  5^  which  minimize  (6. 3)  are  called  Bayes  with  respect  to  the 
a  priori  distribution  t)  ,  and  the  function 


(6.6)  R(n)  =  R(s  n)  =  min  R(&,  t)) 

T|  5 

is  called  Bayes  envelope  function. 

Returning  to  our  sequence  of  decision  problems,  let 

00 

£1  =  [all  possible  infinite  sequences  of  0's  and  l’s},  and 
£]n  =  [all  2n  n-vectors  of  0fs  and  l's]  .  For  any 

A  =  (A-,A^,...)  €  Q  ,  let  A  =  (A, ,A_ , . . . ,A  )  e  denote  its  initial 
—  v  1  2  — n  x  1  2  n 

n-vector  with  corresponding  random  variable  X  =  (X, ,X_,.,.,X  )  where 

— n  N  1  27  n 

X .  is  distributed  according  to  F,  with  density  f,  (x.)  and  is 

i  A.  A.  l 

1  1 

independent  of  the  other  X’s  and  Ars  .  No  relationship  among  the 
ATs  is  assumed. 

By  an  (n-step)  compound  decision  rule  we  mean  any  n-vector 

D  =  (5  , 5., . . . , 6  )  of  measurable  functions  where,  in  the  sequential 
n  1  2  n 

case,  6.  =  5 , (x  ,x  , . . . ,x  )  and  where  0  <  5.  <  1  is  the  probability 

L  Lib 

with  which  one  decides  A.  =  1  when  X.  =  x.  has  been  observed.  The 

i  —  l  “i 

risk  of  at  the  point  A^  is  defined  by 

*-(W)  . 


r(d  ,x )  =  -  y 

v  n*— : n'  n  ^ 
i=l 


(6.7) 


' 


bn.  fK»l  La*  a ' 0  ao  aainaup**  •  iniin  s  •  zoi  .  h  a 


„.rfw  ( x . ,x.  ,x)  .  a  «*»«  ri3iw 
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where 


(6„8)  R(5.,^i)  =  f  [aAi(l-5.(xi))  +  b(l-Xi)8i(xi)]fA  (x^d^1  . 

J  — -j^ 

Denote  the  random  loss  incurred  in  the  i’th  decision  by 
L(5^(x^) ,A^)  .  Corresponding  to  (6.7)  we  have 


n 


(6.9) 


L(D  ) 
'  n  — n 


i  51(6.^),^) 

i=l 


Let  h(x)  be  an  unbiased  estimate  of  A  ,  and  define  for 

X  “  1  ,  2  ,  0  .  •  p 


f  0 


if  7  )  h(x . )  <  0  , 

1  /  .  J 


(6.10)  p±  =  =<  j  , 

j=l 


i 


V. 


if  0  <  j  ^h^)  <  1  > 

j=l 

i 

if  1  <  j  y h(xj)  > 


and  set  pQ  =  —  .  Consider  the  rule  with 


(6.11)  6*(x  )  -  (^)  =  {  1  ' 

1  1  pi-l  U  , 


1  ,  if  f0Cxi)b(l“Pi_1(xi_1))  <  f1(xi)aPi.1(xi_1) 

otherwise, 


and  let  D  =  (5- ,8^ , . . . ,5  )  „  Letting 
n  v  1  2  n' 


vi =  1  /.  Aj  ' 


j=i 


(6.12) 
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* 

uses  a  rule  which  is  Bayes  with  respect  to  the  estimate  ^  of 
^  to  decide  on  A^  .  Samuel  proves  (Theorem  1  in  [19])  that  if  R(q) 
has  a  derivative  for  0  <  rj  <  1  ,  then  [D^]  defined  by  (6.11)  is  such 
that  for  every  €  >  0  there  exists  an  N(c)  such  that  for  all  n  >  N(e)  , 

(6.13)  R(D  ,A  )  -  R(v  )  <  e  uniformly  in  A  €  Q,  .  . 

'  n  — n  '  n  — n 


Thus  even  if  the  sequence 


W 


* 


is  chosen  arbitrarily  by  Nature, 


our  average  performance  in  using  D  on  the  first  n  decisions  will, 
for  large  n  ,  be  almost  as  good  as  if  we  had  used  throughout  the  Bayes 
decision  function  5^  corresponding  to  the  actual  proportion 


of  ones  among  A-^,  A^,  .  .  0  ^A^  . 

Suppose  that  Rt(t])  does  not  exist  for  some  value  rj  .  By 
means  of  randomization  using  a  uniformly  distributed  random  variable, 

Samuel  finds  decision  rules  whose  Bayes  envelope  function  is  some  "smoothed" 
version  of  the  original  R(q)  and  stays  sufficiently  close  to  R(t])  . 

Let  Z  =  (z^,z^)  be  a  random  variable  which  is  uniformly  distributed 
over  the  unit  square.  Assuming  that  all  Z's  are  independent  of  the 
other  random  variables  X^,  i  =  l,2,...,n  ,  let 


P. (x. )  +  i”1/^  z 

/_  \  1  —1 _ 2  .  ,  _ 

m[Z ,x.  ;  —  ““  _i  y)  1  ~  f  i  —  1,2,..., 


i  ~  .  -1/4  /  7  ’ 

l+i  /  (zl  +  zg) 


(6.14) 


(.)*  ,1  J„dj  ([CXJ  1  '««•*«')  l9u“82  •  /  no  L,t  99  0  i-J  ' 
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where  is  defined  in  (6.10),  and  m(Z,3^)  =  ^  .  Consider  the 

rule  with 

i-1 

where  =  m^x.^)  ,  and  let  Dn  =  (^  (x± ) &2  ) ,  .  .  . §n(xn ) ) 

for  n  =  1,2,...  .  Samuel  proves  (Theorem  2  in  [19])  that  the  sequence 
of  rules  [Dn]  is  such  that  for  every  e  >  0  there  exists  an  N(e) 
such  that  for  all  n  >  N(e)  , 

(6.16)  R(D  ,A  )  -  R(v  )  <  e  uniformly  in  A  e  . 

n  — n  x  n  J  — n 

A 

D  is  not  an  admissible  rule  since  no  admissible  rule  involves 
n 

artificial  randomization  of  the  above  kind.  Dn  is  not  optimal  in  any 
sense  other  than  being  optimal  in  the  limit. 

Samuel  [21]  proves  analogous  results  to  (6.13)  and  (6.16)  when 

the  loss  given  by  (6. 9)  is  used.  It  is  shown  (Theorems  1  and  2  in  [21]) 

that  for  any  sequence  of  A-values,  the  difference  between  the  loss  incurred 

by  ,  L(Dn  ,A  )  ,  and  ^(vn)  converges  to  zero  in  probability,  and 

if  R(q)  has  a  derivative  for  0  <  q  <  1  ,  a  corresponding  statement 

* 

holds  with  probability  one  for  . 

For  the  nonsequential  case  where  all  n  random  variables 

X.  ,X^,...,X  may  be  observed  before  the  decisions  on  A.  ,  i  =  l,2,...,n 
12  n  1 

have  to  be  made,  Hannan  and  Robbins  derive  (Theorem  3  in  [4])  a  decision 
rule  for  which  P-^(L(An)  -  R(vn)  <  e  for  all  n  >  N)  >  1  -  e  uniformly 


(6.15) 


=  6° 
1—1  m 


.  ";-j  iW  9IU7 


m 


((|x)u:',...,^)S;?,(I*)J1a)  -  £  I9i  bn»  ,  .  lmp  ®«riv 
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in  A  e  ft  ,  where  L(A^)  denotes  the  random  loss  incurred  by  their 
rule  in  the  first  n  decisions.  Hannan  and  Robbins  also  show  (Theorem  4 
in  [4])  that  for  the  nonsequential  case  there  exists  a  sequence  of  rules 
{D^}  with  the  property  that  given  any  e  >  0  there  exists  an  N(e) 
such  that  for  every  n  >  N(e)  , 

(6.I7)  R(D  ,  A  )  -  K(v  )  <  e  uniformly  in  A  e  ftn  , 

which  corresponds  to  (6.16)  in  the  sequential  case. 

Van  Ryzin  [24]  generalizes  and  strengthens  the  result  (6. 16) 

in  the  nonsequential  problem  to  the  case  where  each  component  problem 

consists  of  making  one  of  n  decisions  based  on  an  observation  from  one 

of  m  distributions.  Let  X  be  a  random  variable  which  is  known  to 

have  one  of  m  possible  distributions  where  A  is  in  the  finite 

parameter  space  ft  =  [l,2,...,m]  .  After  observing  X  a  decision 

is  made  with  loss  L(A,d)  if  decision  d  is  made  when  X  is 

distributed  as  where  A  =  l,2,...,m  ,  and  d  =  l,2,...,n  .  Thus 

we  have  an  m  X  n  loss  matrix  L(A,d)  .  If  X^,  i  =  1,2,...,N  are 

N  independent  observations  with  X^  distributed  according  to  F^  with 

i 

A^  e  ft  ,  then  a  decision  d^  e  J^)  based  on  all  N  observations  is  to 
be  made  for  each  of  the  N  component  problems.  This  is  a  finite  compound 
decision  problem. 

Let  there  exist  a  cr-finite  measure  ja  which  dominates 

fF. ,F^,...,F  }  such  that  the  densities 
L  1  2  mJ 
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(6.18) 


dF, 


f^(x)  =  (  )(x)  <  K  a“e-  (p)x 


for  some  K  <  oo  ,  and  let  f  (x)  =  (f  (x) ,  f  (x) ,  .  .  . ,  f  (x) )  be  the 
vector  of  densities  in  (6,l8). 

A  randomized  decision  function  for  the  compound  decision 
problem  will  be  any  N  X  n  matrix  of  measurable  functions 
D(x)  =  (Sj(x))  such  that  for  k  =  1,2,...,N  and  d  =  l,2,...,n  , 


5  (x)  =  P{d  =  d |X  =  x]  , 


and 


n 


1=1 


=  1  , 


where  x  =  (x.,x  , .  „.,x  )  is  the  vector  of  observations.  Denote  the 
k'th  row  of  D(x)  by  5^^(x)  =  (6^  (x) ,  5^  (x) ,  •  •  • ,  6^(x) )  .  Let  li  be 
the  set  of  all  N-tuples  A  =  (A^, A^,  . » . ,  A^)  where  e  ^2  For  the 

m  X  n  matrix  of  losses  L(A,d)  ,  the  rows  will  be  denoted  by  L-^  and 

cl  in 

the  columns  by  L  for  A  =  l,2,...,m  ,  and  d  =  l,2,...,n  .  If  E 

is  m-dimensional  Euclidean  space,  then  if  y  =  (y^,  y2>  *  0  *  ,^m^  and 

z  =  (z, , z_,..„,z  )  are  vectors  in  E™  ,  then  the  vector 
v  1'  2  nr 

yz  =  (y1z1>y2Z2’ • *  * ,ymZm^  '  and  the  inner  Product  is 


m 


(y,z)  =  ^  yizi 

i=l 


The  risk  function  R(A, D)  for  the  compound  decision  procedure 
D(x)  is  defined  to  be  the  average  of  the  component  risks 


*‘>  M(  )  -  ("V 
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n 


(6.19) 


V^D)  =  E  |  V  L(Ak,d)  6k(x) 

d=l 


=  E  (L.  ,  6(k)(X))  , 


for  each  subproblem  k  =  1,2,...,N  where  E  denotes  expectation  with 

li, 

respect  to  f]  F,  Thus 

k=l  Ak 


N 


N 


(6.20)  R(X,D)  =  i  V  =  E[|  V  (L^  ,6(k)(x)]  . 

k= 1  k=l  k 

The  empirical  distribution  on  ft  is  given  by  the  vector  p(A)  =  (p-^A), 
p^ (A) , . . . , pm(X) )  ,  where  for  A  €  and  A  =  l,2,...,m  we  define 


(6.21) 


P^(A)  =  |  (number  of  A^  =  A  ,  k  <  N  ) 


Suppose  that  the  decision  function  B(x)  is  simple;  i.e.  there 
exist  functions  6^(*)>  d  =  l>2,...,n  such  that  S^k^(x)  =  (^(x^.)* 

d2(\)’  •  *  *  ’6n(xk))  for  k  =  >  and  we  write  5  =  (bi’b2>  *  •  *’6n)  * 

Then  6  (x)  >  0  for  d  =  l,2,...,n  is  a  set  of  measurable  functions 
d  — 

such  that 


n 


The  risk  incurred  in  using  procedure  6  is  from  (6. 20 ) 


' 


. 
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N 


R(^6)  =  |  V  E(La  ,B(Xk)) 
k=l  k 


m 

y  P^Ch.)  Ea(La,6(x)) 

oi 


(6.22) 


n 


A=1 


eJ  y 

d=l 


L(A,d)  Bd(X) 


m 


y  Px&)  pa(6) 

i 


=  (p(^),  p(s))  , 


where 


p(8) 


ea(la,b(x)) 


L(A,d)  6d(X)  | 


) 


denoting  expectation  with  respect  to  F^  ,  and  p(5)  =  (p^(5), 
P2(6)? • • • >Pn(S) )  • 


If  £  =  (^^g,  . .  .,£m)  is  any  vector  in  Em  ,  let 


(6.23) 


=  (£>p(6))  • 


Thus  when  £  =  p(/\)  ,  ^/(£,6)  is  the  tfisk  function  for  5  given  in 

(6.22).  From  (6.18)  and  (6.23)  we  have 


m 


n 


(6.24) 


vce.B)  -  y  ?a  ea{  y  L(A’d)  Bdw 

?(=1  ^  d=l 


. 


'll 
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m  n 


A=1  d=l 


^  L(A,d)  fA(x)  6d(x)  dn(x) 


n 


[=1 


(if  Ldf(x))5d(^)  |  dn(x) 


For  fixed  £  ,  (6.24)  will  be  a  minimum  for  any  vector  function  d£{  , 

of  the  form 


0 


(6.25)  &^d(x)  = 


if 


if 


arbitrary,  if 


(£,Ldf(x))  >  min  (|,Ljf(x)) 
1<  j<  n 

(^,Ldf(x))  <  min  (^,Ljf(x)) 

j  4  d 

(4,L,df(x))  =  min  (£,L^f(x)) 

j  4  d 


such  that  5^  ^(x)  —  ^  ^or  d  =  l>2,...,n  ,  and 


n 


d=l 


dM  =  1  a,e* 


Thus  if  ^  is  a  bona  fide  a  priori  distribution,  then  such  a  6^  would 
be  a  decision  procedure  Bayes  against  (-  . 


Any  randomized  procedure  of  the  form  (6.25)  may  be  replaced 
by  the  non-randomized  version 


(6.26)  B|,d(x)'“ 


1  ,  if  d  is  the  smallest  integer  for  which 
(4,Ldf(x))  =  min  (£,L^f(x))  , 


0  , 


1<  j<n 


otherwise, 


((x)!1!,  :  ((x)*bJ,;)  t 


. 
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whi'ch  also  minimizes  for  fixed  £  . 


If  and  L^jj,)  are  the  function  spaces  of  p-integrable 

and  |j.-square  integrable  functions  respectively,  then  f^(x)  for 
A  =  i,2,...,m  are  in  L^((j.)  and  L^p.)  since  they  are  bounded.  Define 


m 


S(m)  -  Cnh  eEm  ,  r^>0  ,  V  Tk  =  1 } 

A=1 

to  be  the  simplex  in  E™  ,  and  for  rj  €  S  define  the  probability 
mixture 


m 

%  -  X  ^ 

A=1 


with  u-density  f  (x)  =  ( rj,  f  (x) )  ,  and  let  = 

the  class  of  all  mixtures. 


{F  \r]  e  S^)  be 


A  vector  function  h  =  (h  (x) ,h  (x) , . . . ,h  (x))  into  E™  with 

JL  Cm*  xn 

coordinate  functions  h^  e  L^(|a)  is  an  unbiased  estimate  for  the  class 

/  \ 

Cj/?  if  E^{h(X)]  =  t|  for  all  t)  e  S',m'i  ,  where  E^  denotes  expectation 


with  respect  to  the  mixture  F  ,  and  if  h  exists,  the  class  c7^ 


n 


is 


called  estimable.  Denote  the  class  of  all  unbiased  estimates  for  the 
clabs  by  ^  and  the  subclass  of  ^  for  which  h^  €  Lp 

for  j  =  1,2, .  ,.,m  ,  by  fC  . 


Define  the  random  variable 


N 


(6.27) 


h®  =  i  V  h(xk)  , 
ci 


•  ^  .  V. 
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where  he  ^  and  X  =  (X^X^,  .  . .  ,X^)  .  Van  Ryzin  shows  that  h(x) 
is  an  unbiased  estimate  of  the  empirical  distribution  p(A)  for  all 
A  €  ft  and  for  h  €  j(^  defines  the  non-simple,  non-randomized  decision 

function 


(6.28)  h'_  (x  )  =  < 


1  ,  if  d  is  the  smallest  integer  for  which 
(h,Ldf(x  ))  =  min  (h,Ljf(x,  )) 


h,  d 


1  <  j  <  n 


0  ,  otherwise 


for  d  =  l,2,...,n  which  results  from  substituting  h(X)  for  p(A) 
in  (6.26).  The  resulting  non-simple,  non-randomized  decision  procedure 
D*  consists  of  the  N  vector  functions  « 


6  (x)  =  &L(x  )  =  (6_|_  (x  ),S^  (xj,...,6^  (x  )) 

h  k  h,l  h,2  h ,  n 


for  k  =  1,2,...,N  .  Let  d» ( p ( A ))  =  inf  ^(.p(A),£>) 

Van  Ryzin  proves  (Theorem  2  in  [24])  that  if  h  e 
|hj  (X)  p  <  oo  for  A  ,  j  =  1,2,  . .  ,,m  ,  then 


=  inf  R(A,6)  .  Then 

6 


<£>  and 


(6.29) 


R(A,D')  -  <Kp(A))  <  c  N'1/2  ; 


where  c  is  independent  of  A  e  ft  for  all  N  .  Thus  Van  Ryzin  has 
found  sufficient  conditions  for  a  uniform  bound  on  the  difference  in 
risks  (the  regret  function)  of  a  certain  compound  procedure  and  a  best 
"simple"  procedure  which  is  Bayes  against  the  empirical  distribution  on 
the  component  parameter  space. 
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In  the  above  results  the  family  of  distribution  functions 
governing  the  observation  is  assumed  to  be  finite,  and  the  main  results 
are  concerned  only  with  the  convergence  to  zero  of  the  difference  between 
the  average  risk  and  a  certain  "optimal"  goal  as  the  number  of  component 
problems  becomes  large. 


In  more  recent  work  Swain  [23]  considers  standard  (infinite 

state)  estimation  problems  with  squared  error  loss.  Let  A  =  (A^, . . . ) 

be  a  countably  infinite  vector  whose  components  A^  are  elements  of  some 

finite  interval  ft  of  the  real  line:  i.e.  -°o  <  a  <  A.  <  0  <  °o  for 

—  i  — 

i  =  1,2,...  .  Let  ^7  =  [f^(x)  :  A  e  ft)  be  a  family  of  known  probtbi  lity 
density  functions  with  parameter  A  .  If  A  is  unknown,  we  want  to 
estimate  A^  for  each  i  with  each  estimate  being  based  on  the  independ¬ 
ent  observations  ,  j  =  l,2,...,i  .  Thus  for  each  i  =  1,2,...  , 
a  non-randomized  estimator  is  sought  for  A^  where 

x.  =  (x, ,x_ , . . . ,x. )  is  the  vector  of  observations. 

— i  12  l 


Assuming  that  we  have  a  squared  error  loss,  the  loss  when 

2 

(jL  is  an  estimate  of  A_^  is  (cj)_^  -  A^)  .  The  risk  of  the  estimator 

2 

(j)^  is  defined  to  be  the  expected  loss  E((|l(X^)  -  A^)  ,  and  the 

average  risk  for  n  estimations  is  then 


n 

E(<fi(£i)  ■  V2  • 

For  specified  ft  and  a  decision  procedure  (£  =  ,  . . . )  is 

found  by  Swain  which,  on  the  basis  of  its  average  risk  for  the  first  n 
estimations,  is  in  some  sense  optimal  for  large  n  .  Samuel  [22]  also 
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deals  with  the  sequential  compound  decision  problem  when  the  component 
problem  is  an  estimation  problem. 


■ 

n 

. 


- 
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